Dbms Week 1 To 12 Slides
Dbms Week 1 To 12 Slides
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Why Databases?
Module 01: Course Overview
Know Your
Course
Course Outline
ppd@[Link]
Module 01
Partha Pratim • To understand the importance of database management systems in modern day
Das
applications
Objectives &
Outline • To Know Your Course
Why Databases?
Know Your
Course
Course Outline
Module Summary
Module 01
Module Summary
Module 01
Partha Pratim
Das
Objectives &
Outline
Why Databases?
Know Your
Course
Course Outline
Module Summary
Why Databases?
Module 01
• DBMS contains information about a particular enterprise
Partha Pratim
Das ◦ Collection of interrelated data
Objectives &
◦ Set of programs to access the data
Outline ◦ An environment that is both convenient and efficient to use
Why Databases?
• Database Applications:
Know Your
Course
◦ Banking: transactions
Course Outline
◦ Airlines: reservations, schedules
Course Text Book
◦ Universities: registration, grades
Module Summary
◦ Sales: customers, products, purchases
◦ Online retailers: order tracking, customized recommendations
◦ Manufacturing: production, inventory, orders, supply chain
◦ Human resources: employee records, salaries, tax deductions
◦ ···
• Databases can be very large
• Databases touch all aspects of our lives
Database Management Systems Partha Pratim Das 01.5
University Database Example
Module 01
Course Outline • In the early days, database applications were built directly on top of file systems
Course Text Book
Module Summary
Module 01
Module 01
• Atomicity of updates
Partha Pratim
Das ◦ Failures may leave database in an inconsistent state with partial updates carried out
Objectives &
◦ Example: Transfer of funds from one account to another should either complete or
Outline not happen at all
Why Databases?
• Concurrent access by multiple users
Know Your
Course
◦ Concurrent access needed for performance
Course Outline
◦ Uncontrolled concurrent accesses can lead to inconsistencies
Course Text Book
Module Summary
. Example: Two people reading a balance (say 100) and updating it by
withdrawing money (say 50 each) at the same time
• Security problems
◦ Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
Module 01
Partha Pratim
Das
Objectives &
Outline
Why Databases?
Know Your
Course
Course Outline
Module Summary
Module 01
• Set Theory
Partha Pratim
Das ◦ Definition of a Set
Objectives & . Intensional Definition
Outline
. Extensional Definition
Why Databases?
. Set-builder Notation
Know Your
Course ◦ Membership, Subset, Superset, Power Set, Universal Set
Course Outline
◦ Operations on sets:
Course Text Book
Module Summary
. Union, Intersection, Complement, Difference, Cartesian Product
◦ De Morgan’s Law
◦ Courses
. MOOCs: Discrete Mathematics:
[Link]
. Online Degree Foundational Course: Mathematics for Data Science I
[Link]
Module 01
• Relations and Functions
Partha Pratim
Das ◦ Definition of Relations
Objectives &
◦ Ordered Pairs and Binary Relations
Outline
. Domain and Range
Why Databases?
. Image, Preimage, Inverse
Know Your
Course . Properties: Reflexive, Symmetric, Antisymmetric, Transitive, Total
Course Outline
◦ Definition of Functions
Course Text Book
◦ Properties of Functions: Injective, Surjective, Bijective
Module Summary
◦ Composition of Functions
◦ Inverse of a Function
◦ Courses
. MOOCs: Discrete Mathematics:
[Link]
. Online Degree Foundational Course: Mathematics for Data Science I
[Link]
Module 01
• Propositional Logic
Partha Pratim
Das ◦ Truth Values & Truth Tables
Objectives &
◦ Operators: conjunction (and), disjunction (or), negation (not), implication,
Outline equivalence
Why Databases?
◦ Closure under Operations
Know Your
Course ◦ Courses
Course Outline . MOOCs: Discrete Mathematics:
Course Text Book [Link]
Module Summary
Module 01
• Predicate Logic
Partha Pratim
Das ◦ Predicates
Objectives &
◦ Quantification
Outline
. Existential
Why Databases?
. Universal
Know Your
Course ◦ Courses
Course Outline
. MOOCs: Discrete Mathematics:
Course Text Book
[Link]
Module Summary
Module 01
• Data Structures
Partha Pratim
Das ◦ Array
Objectives &
◦ List
Outline ◦ Binary Search Tree
Why Databases?
. Balanced Tree
Know Your
Course ◦ B-Tree
Course Outline
◦ Hash Table / Map
Course Text Book
◦ Courses
Module Summary
. MOOCs: Design and Analysis of Algorithms:
[Link]
. MOOCs: Fundamental Algorithms – Design and Analysis:
[Link]
Module 01
• Programming in Python
Partha Pratim
Das ◦ Courses
Objectives & . Online Degree Foundational Course - Programming in Python
Outline
[Link]
Why Databases?
Know Your
Course
Course Outline
Module Summary
Module 01
• Algorithms and Programming in C
Partha Pratim
Das ◦ Sorting
Objectives & . Merge Sort
Outline
. Quick Sort
Why Databases?
Know Your
◦ Search
Course
. Linear Search
Course Outline
. Binary Search
Course Text Book
Module Summary
. Interpolation Search
◦ Courses
. MOOCs: Design and Analysis of Algorithms:
[Link]
. MOOCs: Introduction to Programming in C:
[Link]
Module 01
• Object-Oriented Analysis and Design
Partha Pratim
Das ◦ Courses
Objectives & . MOOCs: Object-Oriented Analysis and Design:
Outline
[Link]
Why Databases?
Know Your
Course
Course Outline
Module Summary
Module 01
Partha Pratim
Das
Objectives &
Outline
Why Databases?
Know Your
Course
Course Outline
Module Summary
Module 01
Database System Concepts,
Partha Pratim
Das Sixth Edition,
Objectives &
Outline
Abraham Silberschatz,
Why Databases?
Henry Korth,
Know Your
Course S. Sudarshan,
Course Outline
Website: [Link]
7th Edition will also do
Module 01
Partha Pratim • Elucidates the importance of database management systems in modern day applications
Das
• Introduced various aspects of the Course
Objectives &
Outline
Why Databases?
Know Your
Course
Course Outline
Module Summary
Slides used in this presentation are borrowed from [Link] with kind permission of the
authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Evolution of Data
Management Module 02: Why DBMS?/1
History
Module Summary
ppd@[Link]
Module 02
Partha Pratim • To understand the need for a DBMS from historical perspective
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
Module 02
Evolution of Data
Management
History
Module Summary
Module 02
Partha Pratim
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
Module Summary
• Archival
For:
• Individual
• Small / Big Enterprise
• Global
There have been two major approaches in this practice:
• Physical
• Electronic
Database Management Systems Partha Pratim Das 02.5
Data Management: Physical PPD
Module 02
Partha Pratim
Physical Data or Records management, more formally known as Book Keeping, has been
Das using physical ledgers and journals for centuries.
Objectives &
Outline
The most significant development happened when Henry Brown, an American inventor,
Evolution of Data
Management patented a “receptacle for storing and preserving papers” on November 2, 1886.
History
Module Summary
Herman Hollerith adapted the punch cards used for weaving looms to act as the memory
for a mechanical tabulating machine, in 1890.
Module 02 Electronic Data or Records management moves with the advances in technology -
Partha Pratim especially of memory, storage, computing, and networking.
Das
• 1950s: Computer Programming started
Objectives &
Outline
• 1960s: Data Management with punch card / tapes and magnetic tapes
Evolution of Data
Management • 1970s:
History
◦ COBOL and CODASYL approach was introduced in 1971
Module Summary
◦ On October 14 in 1979, Apple II platform shipped VisiCalc, marking the birth of the
spreadsheet
◦ Magnetic disks became prevalent
• 1980s: RDBMS changed the face of data management
• 1990s: With Internet data management started becoming global
• 2000s: e-Commerce boomed, NoSQL was introduced for unstructured data
management
• 2010s: Data Science started riding high
Database Management Systems Partha Pratim Das 02.7
Electronic Data Management Parameters PPD
Module Summary
• Ease of Use
• Consistency
• Efficiency
• Cost
• ...
Module 02
Partha Pratim
Recall how shop owners used to maintain their accounts.
Das A book register was maintained on which the shop owner wrote the amount received from
Objectives & customers, the amount due for any customer, inventory details and so on.
Outline
Evolution of Data
Management Problems with such an approach of book-keeping:
History
• Durability: Physical damage to these registers is a possibility due to rodents, humidity,
Module Summary
wear and tear
• Scalability: Very difficult to maintain for many years, some shops have numerous
registers spanning over years
• Security: Susceptible to tampering by outsiders
• Retrieval: Time consuming process to search for a previous entry
• Consistency: Prone to human errors
Not only small shops but large organizations also used to maintain their transaction details
in book registers.
Database Management Systems Partha Pratim Das 02.9
Spreadsheet Files - A better solution PPD
Module 02
Partha Pratim
Spreadsheet Softwares like Google Sheets: Due to the disadvantages of maintaining
Das ledger registers, organizations dealing with huge amount of data shifted to using
Objectives & spreadsheet softwares for maintaining their records in files.
Outline
Evolution of Data
• Durability: These are computer applications and hence data is less prone to physical
Management
damage.
History
Module Summary
• Scalability: Easier to search, insert and modify records as compared to book ledgers
• Security: Can be password-protected
• Easy of Use: Computer applications are used to search and manipulate records in the
spreadsheets leading to reduction in manpower needed to perform routine computations
• Consistency: Not guaranteed but spreadsheets are less prone to mistakes than
registers.
Module 02
Partha Pratim
Lack of efficiency in meeting growing needs PPD
Das
• With rapid scale up of data, there has been considerable increase in the time required
Objectives &
Outline
to perform most operations.
Evolution of Data • A typical spreadsheet file may have an upper limit on the number of rows.
Management
Module 02
Partha Pratim
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
History of DBMS
Module 02
Module 02
Module 02
Partha Pratim
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
Module 02
Partha Pratim
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
Module 02
Partha Pratim
Das
Objectives &
Outline
Evolution of Data
Management
History
Module Summary
Module 02
Evolution of Data
Management
History
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
File Systems vs
Databases Module 03: Why DBMS?/2
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Partha Pratim Das
ppd@[Link]
Module 03
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Module 03
Objectives &
Outline
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Module 03
Partha Pratim • File handling by Python viz-a-viz DBMS - Bank Transaction example
Das
• Parameterized Comparison
Objectives &
Outline
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Module 03
Partha Pratim
Das
Objectives &
Outline
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Module 03
Partha Pratim
Banking Transaction System
Das
Objectives & Consider a simple banking system where a person can open a new account, transfer fund to
Outline
an existing account and check the history of all her transactions till date.
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
The application performs the following checks:
Comparison
Module Summary
• If the account balance is not enough, it will not allow the fund transfer
• If the account numbers are not correct, it will flash a message and terminate the
transaction.
• If a transaction is successful, it prints a confirmation message.
Module 03
Partha Pratim
We will use this banking transaction system to compare various features of a file-based
Das (spreadsheet/.csv files) implementation viz-a-viz a DBMS-based implementation
Objectives &
Outline
• Account details are stored in
File Systems vs ◦ [Link] for file-based implementation
Databases
Python viz-a-viz SQL
◦ Accounts table for DBMS implementation
Parameterized
Comparison • The transaction details are stored in
Module Summary
◦ [Link] file for file-based implementation
◦ Ledger table for DBMS implementation
In the following slides we discuss a fund transfer transaction.
Module 03
Partha Pratim
Das
Objectives &
Outline
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Module 03
Python SQL
Partha Pratim
try : do $$
Das
for sRec in f_reader1 : begin
# CONDITION CHECK FOR ENOUGH BALANCE amt = 5000;
Objectives & if sRec [ " AcctNo " ] == debitAcc and sendVal = ’1800090’;
Outline int ( sRec [ " Balance " ]) > int ( amt ) : recVal = ’1800100’;
for rRec in f_reader2 : select balance from accounts
File Systems vs if rRec [ " AcctNo " ] == creditAcc : into sbalance
Databases sRec [ " Balance " ] = # DEBIT where account_no = sendVal;
Python viz-a-viz SQL str ( int ( sRec [ " Balance " ]) - int ( amt ) ) if sbalance < amt then
Parameterized temp . append ( sRec ) raise notice "Insufficient balance";
Comparison # Critical point else
f_writer . writerow ({ " Acct1 " : sRec [ " AcctNo " ] , update accounts
Module Summary " Acct2 " : rRec [ " AcctNo " ] , set balance =
" Amount " : amt , " D / C " : " D " }) balance - amt
rRec [ " Balance " ] = # CREDIT where account_no = sendVal;
str ( int ( rRec [ " Balance " ]) + int ( amt ) ) insert into
temp . append ( rRec ) ledger(sendAc, recAc, amnt, ttype)
f_writer . writerow ({ " Acct1 " : rRec [ " AcctNo " ] , values(sendVal, recVal, amt, ’D’);
" Acct2 " : sRec [ " AcctNo " ] , update accounts
" Amount " : amt , " D / C " : " C " }) set balance =
success = success + 1 balance + amt
break where account_no = recVal;
f_ob j_Accoun t1 . seek (0) insert into
next ( f_ obj_Account1 ) ledger(sendAc, recAc, amnt, ttype)
for record in f_reader1 : values(recVal, sendVal, amt, ’C’);
if record [ " AcctNo " ] != temp [0][ " AcctNo " ] and commit;
record [ " AcctNo " ] != temp [1][ " AcctNo " ]: raise notice "Successful";
temp . append ( record ) end if;
except : end; $$
print ( " Wrong input entered !!! " )
Database Management Systems Partha Pratim Das 03.13
Bank Transaction: Python viz-a-viz SQL (6) PPD
f_obj_Account.close()
print("Transaction is successful !!")
else:
print(’Transaction failed : Confirm Account details’)
Module 03
Parameter File Handling via Python DBMS
Partha Pratim Scalability with re- Very difficult to handle insert, update and In-built features to provide high scalability for
Das spect to querying of records a large number of records
amount of data
Objectives &
Outline
Scalability with re- Extremely difficult to change the structure of Adding or removing attributes can be done
spect to changes records as in the case of adding or removing seamlessly using simple SQL queries
File Systems vs in structure attributes
Databases
Python viz-a-viz SQL
Time of execution In seconds In milliseconds
Parameterized
Persistence Data processed using temporary data struc- Data persistence is ensured via automatic, sys-
Comparison tures have to be manually updated to the file tem induced mechanisms
Module Summary Robustness Ensuring robustness of data has to be done Backup, recovery and restore need minimum
manually manual intervention
Security Difficult to implement in Python (Security at User-specific access at database level
OS level)
Programmer’s Most file access operations involve extensive Standard and simple built-in queries reduce the
productivity coding to ensure persistence, robustness and effort involved in coding thereby increasing a
security of data programmer’s throughput
Arithmetic opera- Easy to do arithmetic computations Limited set of arithmetic operations are avail-
tions able
Costs Low costs for hardware, software and human High costs for hardware, software and human
resources resources
Module 03
Partha Pratim
Das
Objectives &
Outline
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Parameterized Comparison
Module 03
Partha Pratim
File handling via Python DBMS
Das
• The effort needed to implement a file • The effort to install and configure a DB
Objectives &
Outline handler is quite less in Python in a DB server is expensive & time
File Systems vs
• In order to process a 1GB file, a program consuming
Databases
Python viz-a-viz SQL in Python would typically take few • In order to process a 1GB file, an SQL
Parameterized
Comparison seconds. query would typically take few
Module Summary milliseconds.
• If the number of records is very small, the overhead in installing and configuring a
database will be much more than the time advantage obtained from executing the
queries.
• However, if the number of records is really large, then the time required in the
initialization process of a database will be negligible as compared to the time saved in
using SQL queries.
Module 03
Partha Pratim
File handling via Python DBMS
Das
• Extensive support for arithmetic and • Limited support for arithmetic and
Objectives &
Outline logical operations: Extensive arithmetic logical operations: SQL provides limited
File Systems vs and logical operations can be performed arithmetic and logical operations. Any
Databases
Python viz-a-viz SQL on data using Python. These include other complex computation has to be
Parameterized
Comparison complex numerical calculations and done outside the SQL.
Module Summary recursive computations.
Module 03
Partha Pratim
File handling via Python DBMS
Das
• File systems are cheaper to install and • Large databases are served by dedicated
Objectives &
Outline use. No specialized hardware, software or database servers need large storage and
File Systems vs personnel are required to maintain processing power
Databases
Python viz-a-viz SQL filesystems. • DBMSs are expensive software that have
Parameterized
Comparison to be installed and regularly updated
Module Summary
• Databases are inherently complex and
need specialized people to work on it -
like DBA
• The above factors lead to huge costs in
implementing and maintaining database
management systems
Module 03
Partha Pratim • Elucidated the difference between File handling by Python viz-a-viz DBMS through an
Das
Bank Transaction example
Objectives &
Outline • Parameterized Comparison
File Systems vs
Databases
Python viz-a-viz SQL
Parameterized
Comparison
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Levels of
Abstraction Module 04: Introduction to DBMS/1
Schema and
Instance
Data Models
ppd@[Link]
Module 04
Partha Pratim • Comparison of data management using Python & files and DBMS
Das
• Efficacy and Efficient DBMS highlighted
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Partha Pratim • To familiarize with the basic notions and terminology of database management systems
Das
• To understand the role of data models and languages
Objectives &
Outline • To understand the approaches to database design
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Database Design
Module Summary
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Levels of Abstraction
Module 04
Partha Pratim • Physical level: describes how a record (for example, instructor) is stored
Das
• Logical level: describes data stored in database, and the relationships among the data
Objectives &
Outline fields
Levels of type instructor = record
Abstraction
Schema and
ID : string;
Instance name : string;
Data Models
dept name : string;
DDL and DML
salary : integer;
SQL
end;
Database Design
Module Summary • View level: application programs hide details of data types
◦ Views can also hide information (such as an employee’s salary) for security purposes
Module 04
Partha Pratim
An architecture for a database system
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Schema and Instance
Module 04
Partha Pratim • Similar to type of a variable and value of the variable at run-time in programming
Das
languages
Objectives &
Outline • Schema
Levels of
Abstraction
◦ Logical Schema – the overall logical structure of the database
Schema and . Analogous to type information of a variable in a program
Instance
. Example: The database consists of information about a set of customers and
Data Models
Module 04
Schema and
Instance
Data Models
Module Summary
Module 04
Partha Pratim • Physical Data Independence – the ability to modify the physical schema without
Das
changing the logical schema
Objectives &
Outline ◦ Analogous to independence of Interface and Implementation in Object-Oriented
Levels of Systems
Abstraction
◦ Applications depend on the logical schema
Schema and
Instance ◦ In general, the interfaces between the various levels and components should be well
Data Models defined so that changes in some parts do not seriously influence others.
DDL and DML
SQL
Database Design
Module Summary
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Data Models
SQL
• Other older models
Database Design ◦ Network model
Module Summary ◦ Hierarchical model
• Recent models for Semi-structured or Unstructured data
◦ Converted to easily manageable formats
◦ Content Addressable Storage (CAS) with metadata descriptors
◦ XML format.
◦ RDBMS which supports BLOBs
Database Management Systems Partha Pratim Das 04.13
Data Models (2) PPD
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
DDL and DML
Module 04
Database Design • Data dictionary contains metadata (that is, data about data)
Module Summary ◦ Database schema
◦ Integrity constraints
. Primary key (ID uniquely identifies instructors)
◦ Authorization
. Who can access what
Database Management Systems Partha Pratim Das 04.18
Data Manipulation Language (DML)
Module 04
Partha Pratim • Language for accessing and manipulating the data organized by the appropriate data
Das
model
Objectives &
Outline ◦ DML: also known as Query Language
Levels of
Abstraction
• Two classes of languages
Schema and ◦ Pure – used for proving properties about computational power and for optimization
Instance
Data Models
. Relational Algebra (we focus in this course)
DDL and DML . Tuple relational calculus
SQL . Domain relational calculus
Database Design ◦ Commercial – used in commercial systems
Module Summary
. SQL is the most widely used commercial language
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
SQL
Module 04
Module 04
Partha Pratim
Das
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Database Design
Module 04
Partha Pratim
The process of designing the general structure of the database:
Das
• Logical Design – Deciding on the database schema. Database design requires that we
Objectives &
Outline
find a good collection of relation schema
Levels of ◦ Business decision
Abstraction
Schema and
. What attributes should we record in the database?
Instance
◦ Computer Science decision
Data Models
Module 04
Objectives &
Outline
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Module 04
Partha Pratim • Familiarized with the basic notions and terminology of database management systems
Das
• Introduced the role of data models and languages
Objectives &
Outline • Introduced the approaches to database design
Levels of
Abstraction
Schema and
Instance
Data Models
SQL
Database Design
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Database Design
Object-Relational
Module 05: Introduction to DBMS/2
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Partha Pratim Das
Database Users
& Administrators
Department of Computer Science and Engineering
Module Summary Indian Institute of Technology, Kharagpur
ppd@[Link]
Module 05
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Database Design
Module 05
Partha Pratim
The process of designing the general structure of the database:
Das
• Logical Design
Objectives &
Outline ◦ Deciding on the database schema. Database design requires that we find a good
Database Design collection of relation schema
Object-Relational
Data Models ◦ Business decision
XML: Extensible
Markup Language . What attributes should we record in the database?
Database Engine
Database System
◦ Computer Science decision
Internals
. What relation schemas should we have and how should the attributes be
Database Users
& Administrators distributed among the various relation schemas?
Module Summary
• Physical Design
◦ Deciding on the physical layout of the database
Module 05
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim • Need to come up with a methodology to ensure that each relations in the database is
Das
good
Objectives &
Outline • Two ways of doing so:
Database Design ◦ Entity Relationship Model (Chapter 7)
Object-Relational
Data Models
XML: Extensible
. Models an enterprise as a collection of entities and relationships
Markup Language
. Represented diagrammatically by an entity-relationship diagram
Database Engine
Database System ◦ Normalization Theory (Chapter 8)
Internals
Database Users . Formalize what designs are bad, and test for them
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Database Engine • A wide variety of tools is available for parsing, browsing and querying XML
Database System
Internals
documents/data
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Database Engine
Module 05
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim • Storage manager is a program module that provides the interface between the
Das
low-level data stored in the database and the application programs and queries
Objectives &
Outline
submitted to the system
Database Design • The storage manager is responsible to the following tasks:
Object-Relational
Data Models ◦ Interaction with the OS file manager
XML: Extensible
Markup Language ◦ Efficient storing, retrieving and updating of data
Database Engine
Database System • Issues:
Internals
Database Users
◦ Storage access
& Administrators
◦ File organization
Module Summary
◦ Indexing and hashing
Module 05
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Database Engine
◦ Depends critically on statistical information about relations which the database
Database System
Internals
must maintain
Database Users
◦ Need to estimate statistics for intermediate results to compute cost of complex
& Administrators
expressions
Module Summary
Module 05
Database Users • Concurrency-control manager controls the interaction among the concurrent
& Administrators
transactions, to ensure the consistency of the database.
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
The architecture of a database system is greatly influenced by the underlying computer
Das system on which the database is running:
Objectives &
Outline
• Centralized
Database Design • Client-server
Object-Relational
Data Models
XML: Extensible
• Parallel (multi-processor)
Markup Language
• Distributed
Database Engine
Database System
Internals
• Cloud
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Module 05
Partha Pratim
Das
Objectives &
Outline
Database Design
Object-Relational
Data Models
XML: Extensible
Markup Language
Database Engine
Database System
Internals
Database
Database Users
& Administrators
Module Summary
Module 05
Database Engine
Database System
Internals
Database Users
& Administrators
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 06: Introduction to Relational Model/1
Example of a
Relation
Attributes
Schema and
Instance Partha Pratim Das
Keys
Module 06
Partha Pratim • The proliferation of DBMS in wide range of applications provide motivation to study
Das
the subject
Week Recap
• Know Your Course provided information about prerequisites, outline and text book
Objectives &
Outline
• The specific need for a DBMS discussed in contrast to a file system based application
Example of a
Relation using a programming language like Python
Attributes
• Basic notions of a DBMS are introduced
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Objectives &
◦ Schema
Outline ◦ Instance
Example of a
Relation
◦ Keys
Attributes • To familiarize with different types of relational query languages
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Objectives &
• Keys
Outline
Example of a
• Relational Query Languages
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Attributes
Module 06
Relational Query
◦ Aadhaar #: 12-digit number
Languages ◦ Department: Alpha String
Module Summary
• Attribute values are (normally) required to be atomic; that is, indivisible
• The special value null is a member of every domain. Indicates that the value is unknown
• The null value may cause complications in the definition of many operations
Module 06
• For
Partha Pratim
Das
Students = Roll#, First Name, Last Name, DoB, Passport#, Aadhaar #, Department
Week Recap
• And domain of the attributes as:
Objectives & ◦ Roll #: Alphanumeric string
Outline
◦ First Name, Last Name: Alpha String
Example of a
Relation ◦ DoB: Date
Attributes ◦ Passport #: String (Letter followed by 7 digits) – nullable (optional)
Schema and
Instance
◦ Aadhaar #: 12-digit number
Keys ◦ Department: Alpha String
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Schema and Instance
Module 06
Attributes D 1 × D2 × · · · × D n
Schema and
Instance
Thus, a relation is a set of n-tuples (a1 , a2 , · · · , an ) where each ai ∈ Di
Keys
Relational Query
• The current values (relation instance) of a relation are specified by a table
Languages
• An element t of r is a tuple, represented by a row in a table
Module Summary
• Example:
instructor ≡ (String (5) × String × String × Number +), where ID ∈ String (5),
name ∈ String , dept name ∈ String , and salary ∈ Number +
Module 06 • Order of tuples / rows is irrelevant (tuples may be stored in an arbitrary order)
Partha Pratim
Das • No two tuples / rows may be identical
Week Recap
• Example: instructor relation with unordered tuples
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Keys
Module 06
Module 06
• Students = Roll#, First Name, Last Name, DoB, Passport#, Aadhaar #, Department
Partha Pratim
Das • Super Key: Roll #, {Roll #, DoB}
Week Recap • Candidate Keys: Roll #, {First Name, Last Name}, Aadhaar#
Objectives &
Outline
◦ Passport # cannot be a key. Why?
Example of a ◦ Null values are allowed for Passport # (a student may not have a passport)
Relation
Attributes
• Primary Key: Roll #
Schema and ◦ Can Aadhaar# be a key?
Instance
◦ It may suffice for unique identification. But Roll# may have additional useful
Keys
Relational Query
information. For example: 14CS92P01
Languages
. Read 14CS92P01 as 14-CS-92-P-01
Module Summary
. 14: Admission in 2014
. CS: Department = CS
. 92: Category of Student
. P: Type of admission: Project
. 01: Serial Number
Database Management Systems Partha Pratim Das 06.14
Keys PPD
Module 06
• Secondary / Alternate Key: {First Name, Last Name}, Aadhaar #
Partha Pratim
Das • Simple Key: Consists of a single attribute
Week Recap • Composite Key: {First Name, Last Name}
Objectives &
Outline
◦ Consists of more than one attribute to uniquely identify an entity occurrence
Example of a
◦ One or more of the attributes, which make up the key, are not simple keys in their
Relation
own right
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
• Foreign key constraint: Value in one relation must appear in another
Partha Pratim
Das ◦ Referencing relation
Week Recap
. Enrolment: Foreign Keys – Roll #, Course #
Objectives & ◦ Referenced relation
Outline
Example of a
. Students, Courses
Relation
• A compound key consists of more than one attribute to uniquely identify an entity
Attributes
occurrence
Schema and
Instance ◦ Each attribute, which makes up the key, is a simple key in its own right
Keys
◦ {Roll #, Course #}
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Das
Week Recap
Objectives &
Outline
Example of a
Relation
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Relational Query Languages
Module 06
Partha Pratim
Procedural viz-a-viz Non-procedural or Declarative Paradigms
Das
• Procedural programming requires that the programmer tell the computer what to do
Week Recap
◦ That is, how to get the output for the range of required inputs
Objectives &
Outline ◦ The programmer must know an appropriate algorithm
Example of a
Relation
• Declarative programming requires a more descriptive style
Attributes ◦ The programmer must know what relationships hold between various entities
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Module 06
Partha Pratim
Procedural vs. Non-procedural or Declarative Paradigms
Das
• Example: Square root of n
Week Recap
◦ Procedural
Objectives &
Outline a) Guess x0 (close to root of n)
Example of a
Relation
b) i ← 0
Attributes c) xi+1 ← (xi + n/xi )/2
Schema and d) Repeat Step 2 if |xi+1 − xi | > delta
Instance
Keys
◦ Declarative
Relational Query . Root of n is m such that m2 = n
Languages
Module Summary
Module 06
Module 06
Attributes
Schema and
Instance
Keys
Relational Query
Languages
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 06.22
Module 07
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Relational
Operators Module 07: Introduction to Relational Model/2
Aggregation
Operators
Module Summary
ppd@[Link]
Module 07
Aggregation
• Languages for Relation Model introduced
Operators
Module Summary
Module 07
Relational
Operators
Aggregation
Operators
Module Summary
Module 07
Module 07
Partha Pratim
Das
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
Relational Operators
A B
a1 b1 A B
a1 b2 is not valid a1 b1 is
a1 b2 a1 b2
a1 b1
Database Management Systems Partha Pratim Das 07.6
Select Operation – selection of rows (tuples)
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• σA=B∧D>5 (r )
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• πA,C (r )
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• r ∪s
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• r −s
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• r ∩s
Note: r ∩ s = r − (r − s)
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• r ×s
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary • r ×s
Module 07
Partha Pratim • Allows us to refer to a relation, (say E ) by more than one name.
Das
Objectives & ρX (E )
Outline
Relational
Operators returns the expression E under the name X
Aggregation
Operators
• Relations r
Module Summary
• r × ρs (r )
Module 07
Aggregation
Operators
Module Summary
• σA=C (r × s)
Module 07
Partha Pratim • Let r and s be relations on schemas R and S respectively. Then, the “natural join” of
Das
relations R and S is a relation on schema R ∪ S obtained as follows:
Objectives &
Outline ◦ Consider each pair of tuples tr from r and ts from s.
Relational ◦ If tr and ts have the same value on each of the attributes in R ∩ S, add a tuple t to
Operators
the result, where
Aggregation
Operators . t has the same value as tr on r
Module Summary
. t has the same value as ts on s
Module 07
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
• Natural Join
◦ r ./ s
Module 07
Partha Pratim
Das
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
Aggregation Operators
Module 07
Module Summary
Module 07
Module Summary
Module 07
Partha Pratim
Das
Objectives &
Outline
Relational
Operators
Aggregation
Operators
Module Summary
Module 07
Relational
Operators
Aggregation
Operators
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Outline
Module 08: Introduction to SQL/1
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Partha Pratim Das
Data
Manipulation Department of Computer Science and Engineering
Language (DML):
Query Structure
Indian Institute of Technology, Kharagpur
Select Clause
Where Clause
ppd@[Link]
From Clause
Module Summary
Module 08
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
History of SQL
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim • IBM developed Structured English Query Language (SEQUEL) as part of System R
Das
project. Renamed Structured Query Language (SQL: pronounced still as SEQUEL)
Objectives &
Outline • ANSI and ISO standard SQL:
Outline
SQL-86 First formalized by ANSI
History of SQL SQL-89 + Integrity Constraints
Data Definition SQL-92 Major revision (ISO/IEC 9075 standard), De-facto Industry Standard
Language (DDL) SQL:1999 + Regular Expression Matching, Recursive Queries, Triggers, Support for Procedural and
Create Table
Control Flow Statements, Nonscalar types (Arrays), and Some OO features (structured
Integrity Constraints
types), Embedding SQL in Java (SQL/OLB), and Embedding Java in SQL (SQL/JRT)
Update Table
SQL:2003 + XML features (SQL/XML), Window Functions, Standardized Sequences, and Columns
Data with Auto-generated Values (identity columns)
Manipulation
Language (DML): SQL:2006 + Ways of importing and storing XML data in an SQL database, manipulating it within
Query Structure the database, and publishing both XML and conventional SQL-data in XML form
Select Clause SQL:2008 Legalizes ORDER BY outside Cursor Definitions
Where Clause + INSTEAD OF Triggers, TRUNCATE Statement, and FETCH Clause
From Clause
SQL:2011 + Temporal Data (PERIOD FOR)
Module Summary Enhancements for Window Functions and FETCH Clause
SQL:2016 + Row Pattern Matching, Polymorphic Table Functions, and JSON
SQL:2019 + Multidimensional Arrays (MDarray type and operators)
Module 08
Partha Pratim • SQL is the de facto industry standard today for relational or structred data systems
Das
• Commercial systems as well as open systems may be fully or partially compliant to one
Objectives &
Outline or more standards from SQL-92 onward
Outline
• Not all examples here may work on your particular system. Check your system’s SQL
History of SQL
documentation
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim • There aren’t any alternatives to SQL for speaking to relational databases (that is, SQL
Das
as a protocol), but there are many alternatives to writing SQL in the applications
Objectives &
Outline • These alternatives have been implemented in the form of frontends for working with
Outline relational databases. Some examples of a frontend include (for a section of languages):
History of SQL
◦ SchemeQL and CLSQL, which are probably the most flexible, owing to their Lisp
Data Definition
Language (DDL) heritage, but they also look like a lot more like SQL than other frontends
Create Table
Integrity Constraints
◦ LINQ (in .Net)
Update Table ◦ ScalaQL and ScalaQuery (in Scala)
Data
Manipulation
◦ SqlStatement, ActiveRecord and many others in Ruby
Language (DML):
Query Structure
◦ HaskellDB
Select Clause ◦ ...the list goes on for many other languages.
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim • There are several query languages that are derived from or inspired by SQL. Of these,
Das
the most popular and effective is SPARQL.
Objectives &
Outline ◦ SPARQL (pronounced sparkle, a recursive acronym for SPARQL Protocol and RDF
Outline Query Language) is an RDF query language
History of SQL . A semantic query language for databases - able to retrieve and manipulate data
Data Definition
Language (DDL) stored in Resource Description Framework (RDF) format.
Create Table . It has been standardized by the W3C Consortium as key technology of the
Integrity Constraints
Update Table semantic web
Data . Versions:
Manipulation
Language (DML):
Query Structure
− SPARQL 1.0 (January 2008)
Select Clause − SPARQL 1.1 (March, 2013)
Where Clause
From Clause . Used as the query languages for several NoSQL systems - particularly the Graph
Module Summary Databases that use RDF as store
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Data Definition Language (DDL)
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim
The SQL data-definition language (DDL) allows the specification of information about
Das relations, including:
Objectives &
Outline
• The Schema for each Relation
Outline • The Domain of values associated with each Attribute
History of SQL
• Integrity Constraints
Data Definition
Language (DDL)
• And, as we will see later, also other information such as
Create Table
Integrity Constraints ◦ The set of Indices to be maintained for each relations
Update Table
Data
◦ Security and Authorization information for each relation
Manipulation
Language (DML):
◦ The Physical Storage Structure of each relation on disk
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim • char(n). Fixed length character string, with user-specified length n
Das
• varchar(n). Variable length character strings, with user-specified maximum length n
Objectives &
Outline • int. Integer (a finite subset of the integers that is machine-dependent)
Outline
History of SQL
• smallint(n). Small integer (a machine-dependent subset of the integer domain type)
Data Definition • numeric(p, d). Fixed point number, with user-specified precision of p digits, with d
Language (DDL)
Create Table
digits to the right of decimal point. (ex., numeric(3, 1), allows 44.5 to be stores
Integrity Constraints
Update Table
exactly, but not 444.5 or 0.32)
Data • real, double precision. Floating point and double-precision floating point numbers,
Manipulation
Language (DML): with machine-dependent precision
Query Structure
Select Clause • float(n). Floating point number, with user-specified precision of at least n digits
Where Clause
From Clause • More are covered in Chapter 4
Module Summary
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
• An SQL relation is defined using the create table command:
Partha Pratim
Das create table r (A1 D1 , A2 D2 , . . . , An Dn ),
Objectives &
(integrity -constraint1 ),
Outline
...
Outline
(integrity -constraintk ));
History of SQL
Data Definition
◦ r is the name of the relation
Language (DDL) ◦ each Ai is an attribute name in the schema of relation r
Create Table
Integrity Constraints ◦ Di is the data type of values in the domain of attribute Ai
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim
Das
Objectives &
Outline
Module Summary
Module 08
• not null
Partha Pratim
Das • primary key (A1 , . . . , An )
Objectives &
Outline
• foreign key (Am , . . . , An ) references r
Outline
create table instructor ( create table instructor (
History of SQL
Module Summary
primary key declaration on an attribute automatically ensures not null
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Data Definition
• Drop Table (DDL command)
Language (DDL)
Create Table
◦ drop table r
Integrity Constraints
Update Table
• Alter (DDL command)
Data ◦ alter table r add A D
Manipulation
Language (DML): . Where A is the name of the attribute to be added to relation r and D is the domain of A
Query Structure . All existing tuples in the relation are assigned null as the value for the new attribute
Select Clause
Where Clause ◦ alter table r drop A
From Clause
. Where A is the name of an attribute of relation r
Module Summary . Dropping of attributes not supported by many databases
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Data Manipulation Language (DML):
Select Clause
Where Clause
From Clause
Query Structure
Module Summary
Module 08
• A typical SQL query has the form:
Partha Pratim
Das select A1 , A2 , . . . , An ,
Objectives &
from r1 , r2 , ..., rm
Outline where P
Outline
◦ Ai represents an attribute from ri ’s
History of SQL
◦ ri represents a relation
Data Definition
Language (DDL) ◦ P is a predicate
Create Table
Integrity Constraints • The result of an SQL query is a relation
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
• The select clause lists the attributes desired in the result of a query
Partha Pratim
Das ◦ Corresponds to the projection operation of the relational algebra
Objectives & • Example: find the names of all instructors:
Outline
Outline
select name,
History of SQL
from instructor
Data Definition • NOTE: SQL names are case insensitive (that is, you may use upper-case or lower-case
Language (DDL)
Create Table letters)
Integrity Constraints
Update Table ◦ Name ≡ NAME ≡ name
Data ◦ Some people use upper case wherever we use bold font
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
• SQL allows duplicates in relations as well as in query results!!!
Partha Pratim
Das • To force the elimination of duplicates, insert the keyword distinct after select
Objectives &
Outline
• Find the department names of all instructors, and remove duplicates
Outline
select distinct dept name
History of SQL
from instructor
Data Definition
Language (DDL)
• The keyword all specifies that duplicates should not be removed
Create Table select all dept name
Integrity Constraints
Update Table
from instructor
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
• An asterisk in the select clause denotes all attributes
Partha Pratim
Das select *
Objectives &
from instructor
Outline
• An attribute can be a literal with no from clause
Outline
select ’437’
History of SQL
Data Definition
◦ Results is a table with one column and a single row with value ’437’
Language (DDL)
Create Table
◦ Can give the column a name using:
Integrity Constraints select ’437’ as FOO
Update Table
Module 08 The select clause can contain arithmetic expressions involving the operation, +, –, *, and
Partha Pratim /, and operating on constants or attributes of tuples
Das
Objectives &
• The query:
Outline select ID, name, salary/12
Outline from instructor
History of SQL
Data Definition
• Would return a relation that is the same as the instructor relation, except that the
Language (DDL) value of the attribute salary is divided by 12
Create Table
Integrity Constraints • Can rename “salary /12” using the as clause:
Update Table
Data
select ID, name, salary/12 as monthly salary
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
• The where clause specifies conditions that the result must satisfy
Partha Pratim
Das ◦ Corresponds to the selection predicate of the relational algebra
Objectives & • To find all instructors in Comp. Sci. dept
Outline
Outline
select name
History of SQL
from instructor
Data Definition
where dept name = ’Comp. Sci.’
Language (DDL)
Create Table • Comparison results can be combined using the logical connectives and, or, and not
Integrity Constraints
Update Table ◦ To find all instructors in Comp. Sci. dept with salary > 80000
Data select name
Manipulation
Language (DML): from instructor
Query Structure
Select Clause
where dept name = ’Comp. Sci.’ and salary > 80000
Where Clause
From Clause • Comparisons can be applied to results of arithmetic expressions
Module Summary
Module 08
• The from clause lists the relations involved in the query
Partha Pratim
Das ◦ Corresponds to the Cartesian product operation of the relational algebra
Objectives & • Find the Cartesian product instructor X teaches
Outline
Outline
select *
History of SQL
from instructor , teaches
Data Definition ◦ Generates every possible instructor-teaches pair, with all attributes from both
Language (DDL)
Create Table
relations
Integrity Constraints
Update Table
◦ For common attributes (for example, ID), the attributes in the resulting table are
Data
renamed using the relation name (for example, [Link])
Manipulation
Language (DML): • Cartesian product not very useful directly, but useful combined with where-clause
Query Structure
Select Clause
condition (selection operation in relational algebra)
Where Clause
From Clause
Module Summary
Module 08
Partha Pratim
Das
Objectives &
Outline
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure
Select Clause
Where Clause
From Clause
Module Summary
Module 08
Outline
History of SQL
Data Definition
Language (DDL)
Create Table
Integrity Constraints
Update Table
Data
Manipulation
Language (DML):
Query Structure Slides used in this presentation are borrowed from [Link] with kind
Select Clause
Where Clause
permission of the authors.
From Clause Edited and new slides are marked with “PPD”.
Module Summary
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Additional Basic
Operations Module 09: Introduction to SQL/2
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Partha Pratim Das
Clause
Where Clause
Predicates
Department of Computer Science and Engineering
Duplicates
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 09
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09
Objectives &
Outline
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09
Module Summary
Module 09
Partha Pratim
Das
Objectives &
Outline
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates Additional Basic Operations
Module Summary
Module 09
Module Summary
Module 09
Partha Pratim
Das
Objectives &
Outline
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09 • Find the names of all instructors who have taught some course and the course id
Partha Pratim
Das
select name, course id
from instructor , teaches
Objectives &
Outline where instructor .ID = [Link]
Additional Basic
Operations
◦ Equi-Join, Natural Join
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09
Partha Pratim • Find the names of all instructors in the Art department who have taught some course
Das
and the course id
Objectives &
Outline
select name, course id
Additional Basic
from instructor , teaches
Operations
Cartesian Product
where instructor .ID = [Link] and [Link] name = ’Art’
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09
Partha Pratim • The SQL allows renaming relations and attributes using the as clause:
Das
old name as new name
Objectives &
Outline • Find the names of all instructors who have a higher salary than some instructor in
Additional Basic ’Comp. Sci’.
Operations
Cartesian Product select distinct [Link]
Rename AS
Operation from instructor as T, instructor as S,
String Values
Order By Clause
where T .salary > [Link] and [Link] name = ’Comp. Sci’
Select Top / Fetch
Clause • Keyword as is optional and may be omitted
Where Clause
Predicates instructor as T ≡ instructor T
Duplicates
Module Summary
Module 09
Objectives &
Outline
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
• Find the supervisor of “Bob”
Select Top / Fetch
Clause • Find the supervisor of the supervisor of “Bob”
Where Clause
Predicates • Find ALL the supervisors (direct and indirect) of “Bob”
Duplicates
Module Summary
Module 09
Partha Pratim • SQL includes a string-matching operator for comparisons on character strings. The
Das
operator like uses patterns that are described using two special characters:
Objectives &
Outline ◦ percent ( % ). The % character matches any substring
Additional Basic ◦ underscore ( ). The character matches any character
Operations
Cartesian Product • Find the names of all instructors whose name includes the substring “dar”
Rename AS
Operation select name
String Values
Order By Clause
from instructor
Select Top / Fetch
Clause
where name like ’%dar %’
Where Clause
Predicates • Match the string “100%”
Duplicates
like ’100%’ escape ’\’
Module Summary
• in that above we use backslash (\) as the escape character
Module 09
Module Summary
◦ finding string length, extracting substrings, etc.
Module 09
Module Summary
Module 09
Partha Pratim • The Select Top clause is used to specify the number of records to return
Das
• The Select Top clause is useful on large tables with thousands of records. Returning a
Objectives &
Outline large number of records can impact performance
Additional Basic select top 10 distinct name
Operations
Cartesian Product from instructor
Rename AS
Operation • Not all database systems support the SELECT TOP clause.
String Values
Order By Clause ◦ SQL Server & MS Access support select top
Select Top / Fetch
Clause ◦ MySQL supports the limit clause
Where Clause
Predicates ◦ Oracle uses fetch first n rows only and rownum
Duplicates
Module Summary
select distinct name
from instructor
order by name
fetch first 10 rows only
Module 09
Module 09
Partha Pratim • The in operator allows you to specify multiple values in a where clause
Das
• The in operator is a shorthand for multiple or conditions
Objectives &
Outline select name
Additional Basic from instructor
Operations
Cartesian Product where dept name in (’Comp. Sci.’, ’Biology’)
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Duplicates
Module Summary
Module 09
Partha Pratim • In relations with duplicates, SQL can define how many copies of tuples appear in the
Das
result
Objectives &
Outline • Multiset versions of some of the relational algebra operators – given multiset relations
Additional Basic r1 and r2 :
Operations
Cartesian Product a) σθ (r1 ): If there are c1 copies of tuple t1 in r1 , and t1 satisfies selections σθ , then
Rename AS
Operation there are c1 copies of t1 in σθ (r1 )
String Values
Order By Clause
b) ΠA (r ): For each copy of tuple t1 in r1 , there is a copy of tuple ΠA (t1 ) in ΠA (r1 )
Select Top / Fetch
Clause
where ΠA (t1 ) denotes the projection of the single tuple t1
Where Clause
Predicates
c) r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2 , there
Duplicates are c1 x c2 copies of the tuple t1 .t2 in r1 x r2
Module Summary
Module 09
Partha Pratim • Example: Suppose multiset relations r1 (A, B) and r2 (C ) are as follows:
Das
r1 = {(1, a)(2, a)} r 2 = {(2), (3), (3)}
Objectives &
Outline • Then ΠB (r1 ) would be {(a), (a)}, while ΠB (r1 ) x r2 would be
Additional Basic {(a, 2), (a, 2), (a, 3), (a, 3), (a, 3), (a, 3)}
Operations
Cartesian Product • SQL duplicate semantics:
Rename AS
Operation select A1 , A2 , . . . , An
String Values
Order By Clause from r1 , r2 , . . . , rm
Select Top / Fetch
Clause where P
Where Clause
Predicates is equivalent to the multiset version of the expression:
Duplicates
Module 09
Objectives &
Outline
Additional Basic
Operations
Cartesian Product
Rename AS
Operation
String Values
Order By Clause
Select Top / Fetch
Clause
Where Clause
Predicates
Slides used in this presentation are borrowed from [Link] with kind
Duplicates permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Set Operations
Module 10: Introduction to SQL/3
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Partha Pratim Das
Null Values
ppd@[Link]
Module 10
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Module 10
Partha Pratim • To familiarize with set operations, null values and aggregation
Das
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Module 10
Module Summary
Module 10
Partha Pratim
Das
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Set Operations
Module 10
Partha Pratim • Find courses that ran in Fall 2009 or in Spring 2010
Das
(select course id from section where sem = ’Fall’ and year = 2009)
Objectives &
Outline
union
Set Operations
(select course id from section where sem = ’Spring’ and year = 2010)
Null Values • Find courses that ran in Fall 2009 and in Spring 2010
Three Valued Logic
Aggregate
(select course id from section where sem = ’Fall’ and year = 2009)
Functions intersect
Group By
Having (select course id from section where sem = ’Spring’ and year = 2010)
Null Values
Module Summary
• Find courses that ran in Fall 2009 but not in Spring 2010
(select course id from section where sem = ’Fall’ and year = 2009)
except
(select course id from section where sem = ’Spring’ and year = 2010)
Module 10
Partha Pratim • Find the salaries of all instructors that are less than the largest salary
Das
select distinct T .salary
Objectives &
Outline
from instructor as T, instructor as S
Set Operations
where T .salary < [Link]
Null Values • Find all the salaries of all instructors
Three Valued Logic
Aggregate
select distinct salary
Functions from instructor
Group By
Having • Find the largest salary of all instructors
Null Values
Module Summary
(select “second query” )
except
(select “first query”)
Module 10
Module 10
Partha Pratim
Das
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Null Values
Module 10
Partha Pratim • It is possible for tuples to have a null value, denoted by null, for some of their attributes
Das
• null signifies an unknown value or that a value does not exist
Objectives &
Outline • The result of any arithmetic expression involving null is null
Set Operations
◦ Example: 5 + null returns null
Null Values
Three Valued Logic • The predicate is null can be used to check for null values
Aggregate
Functions ◦ Example: Find all instructors whose salary is null
Group By
Having
select name
Null Values from instructor
Module Summary where salary is null
• It is not possible to test for null values with comparison operators, such as =, <, or <>
We need to use the is null and is not null operators instead
Module 10
Module 10
Partha Pratim
Das
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Aggregate Functions
Module 10
Partha Pratim • These functions operate on the multiset of values of a column of a relation, and return
Das
a value
Objectives &
Outline avg: average value
Set Operations min: minimum value
Null Values max: maximum value
Three Valued Logic
sum: sum of values
Aggregate
Functions count: number of values
Group By
Having
Null Values
Module Summary
Module 10
Partha Pratim • Find the average salary of instructors in the Computer Science department
Das
select avg (salary )
Objectives &
Outline
from instructor
Set Operations
where dept name = ’Comp. Sci’;
Null Values • Find the total number of instructors who teach a course in the Spring 2010 semester
Three Valued Logic
Aggregate
select count (distinct ID)
Functions from teaches
Group By
Having where semester = ’Spring’ and year = 2010;
Null Values
Module Summary
• Find the number of tuples in the course relation
select count (*)
from courses;
Module 10
Partha Pratim
• Find the average salary of instructors in each department
Das
select dept name, avg(salary ) as avg salary
Objectives & from instructor
Outline
group by dept name;
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Module 10
Partha Pratim • Attributes in select clause outside of aggregate functions must appear in group by list
Das
/* erroneous query */
Objectives &
Outline
select dept name, ID, avg(salary )
Set Operations
from instructor
Null Values group by dept name;
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Module Summary
Module 10
Partha Pratim • Find the names and average salaries of all departments whose average salary is greater
Das
than 42000
Objectives &
Outline
select dept name, avg(salary )
Set Operations
from instructor
Null Values group by dept name
Three Valued Logic
having avg(salary ) > 42000;
Aggregate
Functions Note: predicates in the having clause are applied after the formation of groups whereas
Group By
Having predicates in the where clause are applied before forming groups
Null Values
Module Summary
Module 10
Aggregate
• All aggregate operations except count(*) ignore tuples with null values on the
Functions
Group By
aggregated attributes
Having
Null Values
• What if collection has only null values?
Module Summary ◦ count returns 0
◦ all other aggregates return null
Module 10
Partha Pratim • Completed the understanding of set operations, null values, and aggregation
Das
Objectives &
Outline
Set Operations
Null Values
Three Valued Logic
Aggregate
Functions
Group By
Having
Null Values
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 11: SQL Examples
SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String Partha Pratim Das
ORDER BY
IN
Set Department of Computer Science and Engineering
UNION Indian Institute of Technology, Kharagpur
INTERSECT
EXCEPT ppd@[Link]
Aggregation
AVG
MIN
MAX
COUNT
SUM
Module 11
Module 11
Partha Pratim • To recap various basic SQL features through example workout
Das
Week Recap
Objectives &
Outline
SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String
ORDER BY
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM
Module 11
Week Recap
Objectives &
Outline
SQL Examples
SELECT
Cartesian Product /
AS
WHERE: AND / OR
String
ORDER BY
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM
Module 11
Partha Pratim • From the classroom relation in the figure, find the names of buildings in which every
Das
individual classroom has capacity less than 100 (removing the duplicates).
Week Recap
SQL Examples
select distinct building
SELECT from classroom
Cartesian Product /
AS where capacity < 100;
WHERE: AND / OR
String ◦ Output :
ORDER BY
IN building
Set
Painter
UNION Figure: classroom relation
INTERSECT
Taylor
EXCEPT Watson
Aggregation
AVG
MIN
MAX
COUNT
SUM
Module 11
Partha Pratim • From the classroom relation in the figure, find the names of buildings in which every
Das
individual classroom has capacity less than 100 (without removing the duplicates).
Week Recap
SQL Examples
select all building
SELECT from classroom
Cartesian Product /
AS where capacity < 100;
WHERE: AND / OR
String ◦ Output:
ORDER BY
IN
building
Set Painter
UNION Figure: classroom relation Taylor
INTERSECT
Watson
EXCEPT
Watson
Aggregation
AVG
MIN • Note that duplicate retention is the default and hence it is a common practice to skip
MAX
COUNT
all immediately after select.
SUM
Module 11
• Find the list of all students of departments which have a
Partha Pratim
Das budget < $0.1million
Week Recap
select name, budget
Objectives & from student, department name budget
Outline
where [Link] name = [Link] name and Brandt 50000.00
SQL Examples Peltier 70000.00
SELECT
budget < 100000;
Cartesian Product /
Levy 70000.00
AS • The above query first generates every possible student- Sanchez 80000.00
WHERE: AND / OR
String
department pair, which is the Cartesian product of stu- Snow 70000.00
ORDER BY
dent and department. Then, it filters all the rows with Aoi 85000.00
IN
Bourikas 85000.00
Set [Link] name = [Link] name and budget <
UNION
Tanaka 90000.00
INTERSECT
100000.
EXCEPT
Aggregation
• The common attribute dept name in the resulting table are
AVG
renamed using the relation name - [Link] name and
MIN
MAX [Link] name)
COUNT
SUM
Module 11
• The same query in the previous slide can be framed by
Partha Pratim
Das renaming the tables as shown below.
Week Recap
select [Link] as studentname, budget as deptbud-
studentname deptbudget
Objectives & get
Outline Brandt 50000.00
from student as S, department as D Peltier 70000.00
SQL Examples
SELECT
where [Link] name = [Link] name and budget < Levy 70000.00
Cartesian Product /
AS
100000; Sanchez 80000.00
WHERE: AND / OR Snow 70000.00
String
• The above query renames the relation student as S and Aoi 85000.00
ORDER BY
IN
the relation department as D Bourikas 85000.00
Set
• It also displays the attribute name as StudentName and Tanaka 90000.00
UNION
INTERSECT budget as DeptBudget.
EXCEPT
Aggregation • Note that the budget attribute does not have any prefix
AVG
MIN
because it occurs only in the department relation.
MAX
COUNT
SUM
Module 11
• From the instructor and department relations in the figure, find out the names of all instructors whose
Partha Pratim department is Finance or whose department is in any of the following buildings: Watson, Taylor.
Das
instructor ◦ Query:
Week Recap
select name
Objectives & from instructor I, department D
Outline
where [Link] name = [Link] name
SQL Examples
SELECT and ([Link] name = ’Finance’
Cartesian Product /
AS
or building in (‘Watson’,‘Taylor’));
WHERE: AND / OR
String
◦ Output:
ORDER BY
name
IN
Set Srinivasan
UNION department Wu
INTERSECT Einstein
EXCEPT Gold
Aggregation
Katz
AVG
Singh
MIN
MAX
Crick
COUNT Brandt
SUM Kim
Module Summary Database Management Systems Partha Pratim Das 11.9
String Operations PPD
Module 11
• From the course relation in the figure, find the titles of all courses whose course id has
Partha Pratim
Das three alphabets indicating the department.
◦ Query:
Week Recap
select title
Objectives &
Outline
from course
where course id like ‘ -%’;
SQL Examples
SELECT ◦ Output:
Cartesian Product /
AS
title
WHERE: AND / OR
String Intro. to Biology
ORDER BY Genetics
IN Computational Biology
Set Investment Banking
UNION
World History
INTERSECT Figure: course relation Physical Principles
EXCEPT
Aggregation
AVG • The course id of each department has either 2 or 3 alphabets in the beginning, followed
MIN
MAX
by a hyphen and then followed by a 3-digit number. The above query returns the
COUNT names of those departments that have 3 alphabets in the beginning.
SUM
Module 11
• From the student relation in the figure, obtain the list of all students in alphabetic order of departments
Partha Pratim and within each department, in decreasing order of total credits.
Das
◦ Query:
Week Recap
select name, dept name, tot cred
Objectives &
Outline from student
SQL Examples
order by dept name ASC, tot cred DESC;
SELECT
Cartesian Product /
◦ Output:
AS
name dept name tot cred
WHERE: AND / OR
Tanaka Biology 120
String
Zhang Comp. Sci. 102
ORDER BY
Brown Comp. Sci. 58
IN Williams Comp. Sci. 54
Set Shankar Comp. Sci. 32
UNION
Figure: student relation Bourikas Elec. Eng. 98
INTERSECT Aoi Elec. Eng. 60
Chavez Finance 110
EXCEPT
Aggregation
◦ The list is first sorted in alphabetic order Brandt History 80
Sanchez Music 38
AVG of dept name. Peltier Physics 56
MIN
MAX
◦ Within each dept, it is sorted in decreas- Levy
Snow
Physics
Physics
46
0
COUNT ing order of total credits.
SUM
Module 11
Partha Pratim • From the teaches relation in the figure, find the IDs of all courses taught in the Fall or
Das
Spring of 2018.
Week Recap ◦ Query:
Objectives &
Outline
select course id
SQL Examples from teaches
SELECT
Cartesian Product /
where semester in (‘Fall’,‘Spring’)
AS
WHERE: AND / OR
and year =2018;
String
ORDER BY
◦ Output:
IN course id
Set
UNION
CS-315
INTERSECT FIN-201
EXCEPT Figure: teaches relation MU-199
Aggregation HIS-351
AVG CS-101
MIN Note: We can use distinct to remove CS-319
MAX
COUNT duplicates. CS-319
SUM
Module 11 • For the same question in the previous slide, we can find the solution using union
Partha Pratim operator as follows.
Das
◦ Query:
Week Recap select course id
Objectives & from teaches
Outline where semester=‘Fall’
SQL Examples and year =2018
SELECT
Cartesian Product / union
AS
select course id
WHERE: AND / OR
String
from teaches
ORDER BY where semester=‘Spring’
IN
Set
and year =2018
UNION ◦ Output:
INTERSECT
EXCEPT Figure: teaches relation course id
Aggregation CS-101
AVG
CS-315
MIN ◦ Note that union removes all duplicates. If we use union CS-319
MAX all instead of union, we get the same set of tuples as FIN-201
COUNT
SUM
in previous slide. HIS-351
Module Summary
MU-199
Database Management Systems Partha Pratim Das 11.13
Set Operations (2): intersect PPD
Module 11
Partha Pratim • From the instructor relation in the figure, find the names of all instructors who taught
Das
in either the Computer Science department or the Finance department and whose salary
Week Recap is < 80000.
Objectives & ◦ Query:
Outline
select name
SQL Examples from instructor
SELECT
Cartesian Product /
where dept name in (‘Comp. Sci.’,‘Finance’)
AS intersect
WHERE: AND / OR
String
select name
ORDER BY from instructor
IN where salary < 80000;
Set
UNION ◦ Output:
INTERSECT
name
EXCEPT Figure: instructor relation
Aggregation Srinivasan
AVG Katz
MIN
MAX • Note that the same can be achieved using the query:
COUNT
select name from instructor where dept name in(‘Comp. Sci.’, ‘Finance’) and salary < 80000;
SUM
Module 11
• From the instructor relation in the figure, find the names of all instructors who taught
Partha Pratim
Das
in either the Computer Science department or the Finance department and whose salary
is either ≥ 90000 or ≤ 70000.
Week Recap
Objectives &
Outline
◦ Query:
SQL Examples select name
SELECT from instructor
Cartesian Product /
AS
where dept name in (‘Comp. Sci.’,‘Finance’)
WHERE: AND / OR except
String select name
ORDER BY
IN
from instructor
Set where salary < 90000 and salary > 70000;
Figure: instructor relation
UNION
INTERSECT
◦ Output:
name
EXCEPT
Aggregation
◦ Note that the same can be achieved using the
query given below: Srinivasan
AVG
MIN
Brandt
select name from instructor Wu
MAX
COUNT
where dept name in(‘Comp. Sci.’, ‘Finance’)
SUM and (salary >= 90000 or salary <= 70000);
Module Summary Database Management Systems Partha Pratim Das 11.15
Aggregate functions: avg PPD
Module 11
Partha Pratim • From the classroom relation given in the figure, find the names and the average
Das
capacity of each building whose average capacity is greater than 25.
Week Recap
Module 11
Partha Pratim • From the instructor relation given in the figure, find the least salary drawn by any
Das
instructor among all the instructors.
Week Recap
Objectives &
Outline
SQL Examples
SELECT
◦ Query:
Cartesian Product /
AS select min(salary ) as least salary
WHERE: AND / OR
String
from instructor ;
ORDER BY
IN
◦ Output:
Set least salary
UNION 40000.00
INTERSECT
EXCEPT
Aggregation Figure: instructor relation
AVG
MIN
MAX
COUNT
SUM
Module 11
Partha Pratim • From the student relation given in the figure, find the maximum credits obtained by
Das
any student among all the students.
Week Recap
Objectives &
Outline
SQL Examples
SELECT ◦ Query:
Cartesian Product /
AS select max(tot cred) as max credits
WHERE: AND / OR
String from student;
ORDER BY
IN ◦ Output:
Set
UNION
max credits
INTERSECT
120
EXCEPT
Aggregation
AVG Figure: student relation
MIN
MAX
COUNT
SUM
Module 11
Partha Pratim • From the section relation given in the figure, find the number of courses run in each
Das
building.
Week Recap
Objectives &
Outline
◦ Query:
SQL Examples
select building,
SELECT
Cartesian Product / count(course id) as course count
AS
WHERE: AND / OR from section
String group by building ;
ORDER BY
IN ◦ Output:
Set
building course count
UNION
INTERSECT Taylor 5
EXCEPT Packard 4
Aggregation Painter 3
AVG
Watson 3
MIN
MAX
Figure: section relation
COUNT
SUM
Module 11
Partha Pratim • From the course relation given in the figure, find the total credits offered by each
Das
department.
Week Recap
Objectives &
◦ Query:
Outline select dept name,
SQL Examples
sum(credits) as sum credits
SELECT
Cartesian Product / from course
AS
WHERE: AND / OR
group by dept name;
String ◦ Output:
ORDER BY
IN dept name sum credits
Set Finance 3
UNION
History 3
INTERSECT
Physics 4
EXCEPT
Aggregation
Music 3
AVG Comp. Sci. 17
MIN Figure: course relation Biology 11
MAX Elec. Eng. 3
COUNT
SUM
Module 11
Partha Pratim
• SQL Examples have been practiced for
Das
◦ Select
Week Recap ◦ Cartesian Product / as
Objectives & ◦ Where: and / or
Outline
SQL Examples
◦ String Matching
SELECT ◦ Order by
Cartesian Product /
AS ◦ in
WHERE: AND / OR
String
◦ Set Operations: union, intersect, except
ORDER BY ◦ Aggregate Functions: avg, min, max, count, sum
IN
Set
UNION
INTERSECT
EXCEPT
Aggregation
AVG
MIN
MAX
COUNT
SUM
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Nested
Subqueries Module 12: Intermediate SQL/1
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Partha Pratim Das
Modifications of
the Database
Module Summary
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
ppd@[Link]
Module 12
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Module 12
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Module 12
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Module 12
Partha Pratim
Das
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module 12
Partha Pratim
Das
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module 12
Modifications of
the Database
Module Summary
Module 12
Partha Pratim • Find courses offered in Fall 2009 and in Spring 2010. (intersect example)
Das
select distinct course id
Objectives &
Outline
from section
Nested
where semester =’Fall’ and year = 2009 and
Subqueries
Subqueries in the
course id in (select course id
Where Clause
Subqueries in the
from section
From Clause
Subqueries in the
where semester =’Spring’ and year = 2010);
Select Clause
• Find courses offered in Fall 2009 but not in Spring 2010. (except example)
Modifications of
the Database select distinct course id
Module Summary from section
where semester =’Fall’ and year = 2009 and
course id not in (select course id
from section
where semester =’Spring’ and year = 2010);
Module 12
Partha Pratim • Find the total number of (distinct) students who have taken course sections taught by
Das
the instructor with ID 10101
Objectives &
Outline
select count (distinct ID)
Nested
from takes
Subqueries
Subqueries in the
where (course id, sec id, semester, year ) in
Where Clause
Subqueries in the
(select course id, sec id, semester, year
From Clause
Subqueries in the
from teaches
Select Clause
where [Link] = 10101);
Modifications of
the Database • Note: Above query can be written in simpler manner. The formulation above is simply
Module Summary to illustrate SQL features.
Module 12
Partha Pratim • Find names of instructors with salary greater than that of some (at least one) instructor
Das
in the Biology department
Objectives &
Outline
select distinct [Link]
Nested
from instructor as T, instructor as S
Subqueries
Subqueries in the
where T .salary > [Link] and [Link] name = ’Biology’;
Where Clause
Subqueries in the • Same query using some clause
From Clause
Subqueries in the select name
Select Clause
from instructor
Modifications of
the Database where salary > some (select salary
Module Summary from instructor
where dept name = ’Biology’);
Module 12
Modifications of
the Database
Module Summary
Module 12
Partha Pratim • Find the names of all instructors whose salary is greater than the salary of all
Das
instructors in the Biology department
Objectives &
Outline
select name
Nested
from instructor
Subqueries
Subqueries in the
where salary > all (select salary
Where Clause
Subqueries in the
from instructor
From Clause
Subqueries in the
where dept name = ’Biology’);
Select Clause
Modifications of
the Database
Module Summary
Module 12
Modifications of
the Database
Module Summary
Module 12
Partha Pratim • The exists construct returns the value true if the argument subquery is nonempty
Das
◦ exists r ⇔ r 6= ∅
Objectives &
Outline ◦ not exists r ⇔ r = ∅
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Module 12
Partha Pratim • Yet another way of specifying the query “Find all courses taught in both the Fall 2009
Das
semester and in the Spring 2010 semester”
Objectives &
Outline
select course id
Nested
from section as S
Subqueries
Subqueries in the
where semester = ’Fall’ and year = 2009 and
Where Clause
Subqueries in the
exists (select *
From Clause
Subqueries in the
from section as T
Select Clause
where semester = ’Spring’ and year = 2010
Modifications of
the Database
and [Link] id = [Link] id);
Module Summary • Correlation name – variable S in the outer query
• Correlated subquery – the inner query
Module 12
Partha Pratim • Find all students who have taken all courses offered in the Biology department.
Das
select distinct [Link], [Link]
Objectives &
Outline
from student as S
Nested
where not exists ( (select course id
Subqueries
Subqueries in the
from course
Where Clause
Subqueries in the
where dept name = ’Biology’)
From Clause
Subqueries in the
except
Select Clause
(select [Link] id
Modifications of
the Database
from takes as T
Module Summary where [Link] = [Link]));
◦ First nested query lists all courses offered in Biology
◦ Second nested query lists all courses a particular student took
• Note: X − Y = ∅ ⇔ X ⊆ Y
• Note: Cannot write this query using = all and its variants
Database Management Systems Partha Pratim Das 12.17
Test for Absence of Duplicate Tuples: “unique”
Module 12
Partha Pratim • The unique construct tests whether a subquery has any duplicate tuples in its result
Das
• The unique construct evaluates to “true” if a given subquery contains no duplicates
Objectives &
Outline • Find all courses that were offered at most once in 2009
Nested
Subqueries
select [Link] id
Subqueries in the
Where Clause
from course as T
Subqueries in the
From Clause
where unique (select [Link] id
Subqueries in the
Select Clause
from section as R
Modifications of
where [Link] id = [Link] id
the Database
and [Link] = 2009);
Module Summary
Partha Pratim
Das
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Subqueries in the From Clause
Module 12
• SQL allows a subquery expression to be used in the from clause
Partha Pratim
Das • Find the average instructors’ salaries of those departments where the average salary is
Objectives &
greater than $42,000
Outline
select dept name, avg salary
Nested
Subqueries from (select dept name, avg(salary ) as avg salary
Subqueries in the
Where Clause
from instructor
Subqueries in the
From Clause
group by dept name)
Subqueries in the
Select Clause
where avg salary > 42000;
Modifications of • Note that we do not need to use the having clause
the Database
Module 12
Partha Pratim • The with clause provides a way of defining a temporary relation whose definition is
Das
available only to the query in which the with clause occurs
Objectives &
Outline • Find all departments with the maximum budget
Nested with max budget(value) as
Subqueries
Subqueries in the (select max(budget)
Where Clause
Subqueries in the from department)
From Clause
Subqueries in the select [Link]
Select Clause
from department, max budget
Modifications of
the Database where [Link]=max [Link];
Module Summary
Module 12
Partha Pratim • Find all departments where the total salary is greater than the average of the total
Das
salary at all departments
Objectives &
Outline
with dept total (dept name, value) as
Nested
select dept name, sum(salary )
Subqueries
Subqueries in the
from instructor
Where Clause
Subqueries in the
group by dept name,
From Clause
Subqueries in the
dept total avg(value) as
Select Clause
(select avg(value)
Modifications of
the Database
from dept total)
Module Summary select dept name
from dept total, dept total avg
where dept [Link] > dept total [Link];
Partha Pratim
Das
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module 12
Partha Pratim • Scalar subquery is one which is used where a single value is expected
Das
• List all departments along with the number of instructors in each department
Objectives &
Outline select dept name,
Nested (select count(*)
Subqueries
Subqueries in the from instructor
Where Clause
Subqueries in the where [Link] name = [Link] name)
From Clause
Subqueries in the as num instructors
Select Clause
from department;
Modifications of
the Database
• Runtime error if subquery returns more than one result tuple
Module Summary
Module 12
Partha Pratim
Das
Objectives &
Outline
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module 12
Modifications of
the Database
Module Summary
Module 12
Module 12
Partha Pratim • Delete all instructors whose salary is less than the average salary of instructors
Das
delete from instructor
Objectives &
Outline
where salary < (select avg (salary )
Nested
from instructor );
Subqueries
Subqueries in the • Problem: as we delete tuples from deposit, the average salary changes
Where Clause
Subqueries in the
From Clause
• Solution used in SQL:
Subqueries in the
Select Clause a) First, compute avg (salary) and find all tuples to delete
Modifications of b) Next, delete all tuples found above (without recomputing avg or retesting the
the Database
tuples)
Module Summary
Module 12
Module 12
Partha Pratim • Add all instructors to the student relation with tot creds set to 0
Das
insert into student
Objectives &
Outline
select ID, name, dept name, 0
Nested
from instructor
Subqueries
Subqueries in the • The select from where statement is evaluated fully before any of its results are
Where Clause
Subqueries in the inserted into the relation
From Clause
Subqueries in the
Select Clause
• Otherwise queries like
Modifications of
insert into table1 select * from table1
the Database
would cause problem
Module Summary
Module 12
Partha Pratim • Increase salaries of instructors whose salary is over $100,000 by 3%, and all others by a
Das
5%
Objectives &
Outline ◦ Write two update statements:
Nested
Subqueries
update instructor
Subqueries in the set salary = salary ∗ 1.03
Where Clause
Subqueries in the where salary > 100000;
From Clause
Subqueries in the update instructor
Select Clause
Modifications of
set salary = salary ∗ 1.05
the Database where salary <= 100000;
Module Summary
• The order is important
• Can be done better using the case statement (next slide)
Module 12
Modifications of
the Database
Module Summary
Module 12
Partha Pratim • Recompute and update tot creds value for all students
Das
update student S
Objectives &
Outline
set tot creds = (select sum(credits)
Nested
from takes, course
Subqueries
Subqueries in the
where [Link] id = [Link] id and
Where Clause
Subqueries in the
[Link] = [Link] and
From Clause
Subqueries in the
[Link] <> ’F’ and
Select Clause
[Link] is not null);
Modifications of
the Database • Sets tot creds to null for students who have not taken any course
Module Summary
• Instead of sum(credits), use:
case
when sum(credits) is not null then sum(credits)
else 0
end
Module 12
Nested
Subqueries
Subqueries in the
Where Clause
Subqueries in the
From Clause
Subqueries in the
Select Clause
Modifications of
the Database
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Join Expressions
Cross Join
Module 13: Intermediate SQL/2
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join Partha Pratim Das
Views
View Expansion
View Update
Department of Computer Science and Engineering
Materialized Views Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 13
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim
Das
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Join Expressions
Module Summary
Module 13
Partha Pratim • Join operations take two relations and return as a result another relation
Das
• A join operation is a Cartesian product which requires that tuples in the two relations
Objectives &
Outline match (under some condition).
Join Expressions
Cross Join
• It also specifies the attributes that are present in the result of the join
Inner Join
Outer Join
• The join operations are typically used as subquery expressions in the from clause
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Module 13
Partha Pratim • CROSS JOIN returns the Cartesian product of rows from tables in the join
Das
◦ Explicit
Objectives &
Outline select *
Join Expressions from employee cross join department;
Cross Join
Inner Join ◦ Implicit
Outer Join
Left Outer Join select *
Right Outer Join
Full Outer Join
from employee, department;
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join • Relation prereq
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
• Observe that
prereq information is missing for CS-315 and
course information is missing for CS-347
Database Management Systems Partha Pratim Das 13.9
Inner Join PPD
Module 13
Partha Pratim
• course inner join prereq
Das
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join • If specified as natural, the 2nd course id field is skipped
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim • An extension of the join operation that avoids loss of information
Das
• Computes the join and then adds tuples from one relation that does not match tuples
Objectives &
Outline in the other relation to the result of the join
Join Expressions
Cross Join
• Uses null values
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim • Join operations take two relations and return as a result another relation
Das
• These additional operations are typically used as subquery expressions in the from
Objectives &
Outline clause
Join Expressions
Cross Join
• Join condition – defines which tuples in the two relations match, and what attributes
Inner Join are present in the result of the join
Outer Join
Left Outer Join • Join type – defines how tuples in each relation that do not match any tuple in the other
Right Outer Join
Full Outer Join relation (based on the join condition) are treated
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim
• course natural full outer join prereq
Das
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join • What is the difference between the above (equi join), and a natural join?
Full Outer Join
Views
• course left outer join prereq on
View Expansion [Link] id = [Link] id
View Update
Materialized Views
Module Summary
Module 13
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
• course full outer join prereq using (course id)
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim
Das
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Views
Module Summary
Module 13
Partha Pratim • In some cases, it is not desirable for all users to see the entire logical model (that is, all
Das
the actual relations stored in the database.)
Objectives &
Outline • Consider a person who needs to know an instructors name and department, but not the
Join Expressions salary. This person should see a relation described, in SQL, by
Cross Join
Inner Join
select ID, name, dept name
Outer Join
from instructor
Left Outer Join
Right Outer Join
Full Outer Join
• A view provides a mechanism to hide certain data from the view of certain users
Views • Any relation that is not of the conceptual model but is made visible to a user as a
View Expansion
View Update
“virtual relation” is called a view.
Materialized Views
Module Summary
Module 13
Partha Pratim • A view is defined using the create view statement which has the form
Das
create view v as < query expression >
Objectives &
Outline
where < query expression > is any legal SQL expression
Join Expressions • The view name is represented by v
Cross Join
Inner Join • Once a view is defined, the view name can be used to refer to the virtual relation that
Outer Join
Left Outer Join
the view generates
Right Outer Join
Full Outer Join • View definition is not the same as creating a new relation by evaluating the query
Views expression
View Expansion
View Update ◦ Rather, a view definition causes the saving of an expression; the expression is
Materialized Views
substituted into queries using the view
Module Summary
Module 13
Module 13
Module 13
Module Summary
Module 13
Partha Pratim • One view may be used in the expression defining another view
Das
• A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the
Objectives &
Outline expression defining v1
Join Expressions
Cross Join
• A view relation v1 is said to depend on view relation v2 if either v1 depends directly on
Inner Join v2 or there is a path of dependencies from v1 to v2
Outer Join
Left Outer Join • A view relation v is said to be recursive if it depends on itself
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim • A way to define the meaning of views defined in terms of other views
Das
• Let view v1 be defined by an expression e1 that may itself contain uses of view relations
Objectives &
Outline • View expansion of an expression repeats the following replacement step:
Join Expressions
Cross Join
repeat
Inner Join Find any view relation vi in e1
Outer Join
Left Outer Join
Replace the view relation vi by the expression defining vi
Right Outer Join
Full Outer Join
until no more view relations are present in e1
Views • As long as the view definitions are not recursive, this loop will terminate
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim • Add a new tuple to faculty view which we defined earlier
Das
insert into faculty values (’30765’, ’Green’, ’Music’);
Objectives &
Outline • This insertion must be represented by the insertion of the tuple
Join Expressions (’30765’, ’Green’, ’Music’, null)
Cross Join
Inner Join
into the instructor relation
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Module 13
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Partha Pratim • Materializing a view: create a physical table containing all the tuples in the result of
Das
the query defining the view
Objectives &
Outline • If relations used in the query are updated, the materialized view result becomes out of
Join Expressions date
Cross Join
Inner Join ◦ Need to maintain the view, by updating the view whenever the underlying relations
Outer Join
Left Outer Join
are updated
Right Outer Join
Full Outer Join
Views
View Expansion
View Update
Materialized Views
Module Summary
Module 13
Objectives &
Outline
Join Expressions
Cross Join
Inner Join
Outer Join
Left Outer Join
Right Outer Join
Full Outer Join
Views
View Expansion Slides used in this presentation are borrowed from [Link] with kind
View Update
Materialized Views
permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Transactions
Module 14: Intermediate SQL/3
Integrity
Constraints
Referential Integrity
Module Summary
Module 14
Objectives &
Outline
Transactions
Integrity
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Integrity
• To understand Authorization in SQL
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Integrity
• Authorization
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Partha Pratim
Das
Objectives &
Outline
Transactions
Integrity
Constraints
Referential Integrity
Module Summary
Module 14
Module Summary
Module 14
Partha Pratim
Das
Objectives &
Outline
Transactions
Integrity
Constraints
Referential Integrity
Module Summary
Module 14
Partha Pratim • Integrity constraints guard against accidental damage to the database, by ensuring that
Das
authorized changes to the database do not result in a loss of data consistency
Objectives &
Outline ◦ A checking account must have a balance greater than Rs. 10,000.00
Transactions ◦ A salary of a bank employee must be at least Rs. 250.00 an hour
Integrity ◦ A customer must have a (non-null) phone number
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Integrity
• check(P), where P is a predicate
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Authorization
time slot id varchar(4),
Privileges primary key (course id, sec id, semester, year ),
Revocation
Roles
check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’))
Module Summary );
Module 14
Partha Pratim • Ensures that a value that appears in one relation for a given set of attributes also
Das
appears for a certain set of attributes in another relation
Objectives &
Outline • Example: If “Biology” is a department name appearing in one of the tuples in the
Transactions instructor relation, then there exists a tuple in the department relation for “Biology”
Integrity
Constraints • Let A be a set of attributes. Let R and S be two relations that contain attributes A and
Referential Integrity
where A is the primary key of S. A is said to be a foreign key of R if for any values of
SQL Data Types
and Schemas A appearing in R these values also appear in S
Built-in Types
Index
UDT
Domains
Large Object
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14 • With cascading, you can define the actions that the Database Engine takes when a user
Partha Pratim tries to delete or update a key to which existing foreign keys point
Das
• create table course (
Objectives &
Outline course id char(5) primary key,
Transactions title varchar(20),
Integrity dept name varchar(20) references department
Constraints
Referential Integrity )
SQL Data Types
and Schemas
• create table course (
Built-in Types ...
Index
UDT dept name varchar(20),
Domains
Large Object
foreign key (dept name) references department
Authorization on delete cascade
Privileges
on update cascade,
Revocation
Roles ...
Module Summary )
• Alternative actions to cascade: no action, set null, set default
Database Management Systems Partha Pratim Das 14.13
Integrity Constraint Violation During Transactions
Module 14
Module 14
Partha Pratim
Das
Objectives &
Outline
Transactions
Integrity
Constraints
Referential Integrity
Module Summary
Module 14
Partha Pratim • date: Dates, containing a (4 digit) year, month and date
Das
◦ Example: date ‘2005-7-27’
Objectives &
Outline • time: Time of day, in hours, minutes and seconds.
Transactions
◦ Example: time ‘[Link]’ time ‘[Link].75’
Integrity
Constraints • timestamp: date plus time of day
Referential Integrity
Module Summary
Module 14
• create table student
Partha Pratim
Das (ID varchar(5),
Objectives &
name varchar(20) not null,
Outline
dept name varchar(20),
Transactions
tot cred numeric (3,0) default 0,
Integrity
Constraints primary key (ID))
Referential Integrity
Module 14
Partha Pratim • create type construct in SQL creates user-defined type (alias, like typedef in C)
Das
create type Dollars as numeric (12,2) final
Objectives &
Outline ◦ create table department (
Transactions dept name varchar (20),
Integrity building varchar (15),
Constraints
Referential Integrity budget Dollars);
SQL Data Types
and Schemas
Built-in Types
Index
UDT
Domains
Large Object
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Partha Pratim • create domain construct in SQL-92 creates user-defined domain types
Das
create domain person name char(20) not null
Objectives &
Outline • Types and domains are similar
Transactions
• Domains can have constraints, such as not null, specified on them
Integrity
Constraints create domain degree level varchar(10)
Referential Integrity
constraint degree level test
SQL Data Types
and Schemas check (value in (’Bachelors’, ’Masters’, ’Doctorate’));
Built-in Types
Index
UDT
Domains
Large Object
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Partha Pratim • Large objects (photos, videos, CAD files, etc.) are stored as a large object:
Das
◦ blob: binary large object – object is a large collection of uninterpreted binary data
Objectives &
Outline (whose interpretation is left to an application outside of the database system)
Transactions ◦ clob: character large object – object is a large collection of character data
Integrity ◦ When a query returns a large object, a pointer is returned rather than the large
Constraints
Referential Integrity object itself
SQL Data Types
and Schemas
Built-in Types
Index
UDT
Domains
Large Object
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Partha Pratim
Das
Objectives &
Outline
Transactions
Integrity
Constraints
Referential Integrity
Module Summary
Module 14
Module Summary
Module 14
Module Summary
Module 14
Partha Pratim • select: allows read access to relation, or the ability to query using the view
Das
◦ Example: grant users U1 , U2 , and U3 select authorization on the instructor relation:
Objectives &
Outline grant select on instructor to U1 , U2 , U3
Transactions • insert: the ability to insert tuples
Integrity
Constraints • update: the ability to update using the SQL update statement
Referential Integrity
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Module Summary
Module 14
Module 14
Module Summary
Module 14
Authorization
Privileges
Revocation
Roles
Module Summary
Module 14
Integrity
• Discussed authorization in SQL
Constraints
Referential Integrity
Authorization
Privileges
Revocation
Roles
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 14.29
Module 15
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Functions and
Procedural Module 15: Advanced SQL
Constructs
Triggers
Triggers :
Functionality vs
Performance
Partha Pratim Das
Module Summary
ppd@[Link]
Module 15
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Module 15
Functions and
Procedural
Constructs
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Module 15
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Partha Pratim
Das
Objectives &
Outline
Functions and
Procedural
Constructs
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Module 15
Partha Pratim
Das
Objectives &
Outline
Functions and
Procedural
Constructs
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Module 15
Partha Pratim • Functions / Procedures and Control Flow Statements were added in SQL:1999
Das
◦ Functions/Procedures can be written in SQL itself, or in an external
Objectives &
Outline programming language (like C, Java)
Functions and ◦ Functions written in an external languages are particularly useful with specialized
Procedural
Constructs data types such as images and geometric objects
Triggers . Example: Functions to check if polygons overlap, or to compare images for
Triggers :
Functionality vs
Performance
similarity
Module Summary ◦ Some database systems support table-valued functions, which can return a
relation as a result
• SQL:1999 also supports a rich set of imperative constructs, including loops,
if-then-else, and assignment
• Many databases have proprietary procedural extensions to SQL that differ from
SQL:1999
Module 15 • Define a function that, given the name of a department, returns the count of the
Partha Pratim
Das
number of instructors in that department:
create function dept count (dept name varchar(20))
Objectives &
Outline returns integer
Functions and begin
Procedural
Constructs declare d count integer;
Triggers select count (*) into d count
Triggers :
Functionality vs from instructor
Performance
Module Summary
where [Link] name = dept name
return d cont;
end
• The function dept count can be used to find the department names and budget of all
departments with more that 12 instructors:
select dept name, budget
from department
where dept count (dept name ) > 12
Database Management Systems Partha Pratim Das 15.8
SQL functions (2)
Module 15
Triggers
• SQL function are in fact parameterized views that generalize the regular notion of
Triggers : views by allowing parameters
Functionality vs
Performance
Module Summary
Module 15
• Functions that return a relation as a result added in SQL:2003
Partha Pratim
Das • Return all instructors in a given department:
Objectives &
create function instructor of (dept name char(20))
Outline
returns table (
Functions and
Procedural ID varchar(5),
Constructs
name varchar(20),
Triggers
Triggers :
dept name varchar(20)
Functionality vs
Performance salary numeric(8, 2) )
Module Summary returns table
(select ID, name, dept name, salary
from instructor
where [Link] name = instructor [Link] name)
• Usage
select *
from table (instructor of (‘Music’))
Database Management Systems Partha Pratim Das 15.10
SQL Procedures
Module Summary • Procedures can be invoked either from an SQL procedure or from embedded SQL,
using the call statement.
declare d count integer;
call dept count proc(‘Physics’, d count);
• Procedures and functions can be invoked also from dynamic SQL
• SQL:1999 allows overloading - more than one function/procedure of the same name as
long as the number of arguments and / or the types of the arguments differ
Database Management Systems Partha Pratim Das 15.11
Language Constructs for Procedures and Functions
Module 15
Partha Pratim • SQL supports constructs that gives it almost all the power of a general-purpose
Das
programming language.
Objectives &
Outline ◦ Warning: Most database systems implement their own variant of the
Functions and standard syntax
Procedural
Constructs • Compound statement: begin . . . end
Triggers
Triggers :
◦ May contain multiple SQL statements between begin and end.
Functionality vs
Performance ◦ Local variables can be declared within a compound statements
Module Summary
Module 15
• while loop:
Partha Pratim
Das while boolean expression do
Objectives &
sequence of statements;
Outline end while;
Functions and
Procedural • repeat loop:
Constructs
repeat
Triggers
Triggers : sequence of statements;
Functionality vs
Performance until boolean expression
Module Summary end repeat;
Module 15
Module 15
Module 15
Module 15
Partha Pratim • Signaling of exception conditions, and declaring handlers for exceptions
Das
Objectives &
Outline
declare out of classroom seats condition
Functions and
declare exit handler for out of classroom seats
Procedural
Constructs
begin
Triggers ...
Triggers :
Functionality vs
signal out of classroom seats
Performance
...
Module Summary
end
◦ The handler here is exit – causes enclosing begin . . . end to be terminate and exit
◦ Other actions possible on exception
Triggers
Triggers :
create procedure dept count proc(
Functionality vs
Performance
in dept name varchar(20),
Module Summary out count integer)
language C
external name ’/usr/avi/bin/dept count proc’
Module 15
Module 15
Partha Pratim • To deal with security problems, we can do one of the following:
Das
◦ Use sandbox techniques
Objectives &
Outline . That is, use a safe language like Java, which cannot be used to access/damage
Functions and
Procedural
other parts of the database code
Constructs
◦ Run external language functions/procedures in a separate process, with no access to
Triggers
Triggers :
the database process’ memory
Functionality vs
Performance . Parameters and results communicated via inter-process communication
Module Summary
• Both have performance overheads
• Many database systems support both above approaches as well as direct executing in
database system address space
Partha Pratim
Das
Objectives &
Outline
Functions and
Procedural
Constructs
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Triggers
Module 15
Partha Pratim • A trigger defines a set of actions that are performed in response to an insert, update,
Das
or delete operation on a specified table
Objectives &
Outline ◦ When such an SQL operation is executed, the trigger is said to have been activated
Functions and ◦ Triggers are optional
Procedural
Constructs ◦ Triggers are defined using the create trigger statement
Triggers
Triggers :
• Triggers can be used
Functionality vs
Performance ◦ To enforce data integrity rules via referential constraints and check constraints
Module Summary ◦ To cause updates to other tables, automatically generate or transform values for
inserted or updated rows, or invoke functions to perform tasks such as issuing alerts
• To design a trigger mechanism, we must:
◦ Specify the events / (like update, insert, or delete) for the trigger to executed
◦ Specify the time (BEFORE or AFTER) of execution
◦ Specify the actions to be taken when the trigger executes
• Syntax of triggers may vary across systems
Database Management Systems Partha Pratim Das 15.23
Types of Triggers: BEFORE
Module 15
Module 15
Module 15
Partha Pratim
There are two types of triggers based on the level at which the triggers are applied:
Das
• Row level triggers are executed whenever a row is affected by the event on which the
Objectives &
Outline
trigger is defined.
Functions and ◦ Let Employee be a table with 100 rows. Suppose an update statement is executed
Procedural
Constructs to increase the salary of each employee by 10%. Any row level update trigger
Triggers configured on the table Employee will affect all the 100 rows in the table during this
Triggers :
Functionality vs update.
Performance
Module Summary • Statement level triggers perform a single action for all rows affected by a statement,
instead of executing a separate action for each affected row.
◦ Used for each statement instead of for each row
◦ Uses referencing old table or referencing new table to refer to temporary tables
called transition tables containing the affected rows
◦ Can be more efficient when dealing with SQL statements that update a large
number of rows
Module 15
• Triggering event can be an insert, delete or update
Partha Pratim
Das • Triggers on update can be restricted to specific attributes
Objectives & ◦ For example, after update of grade on takes
Outline
Functions and • Values of attributes before and after an update can be referenced
Procedural
Constructs ◦ referencing old row as : for deletes and updates
Triggers ◦ referencing new row as : for inserts and updates
Triggers :
Functionality vs
Performance • Triggers can be activated before an event, which can serve as extra constraints.
Module Summary For example, convert blank grades to null.
create trigger setnull trigger before update of takes
referencing new row as nrow
for each row
when ([Link] = ‘ ‘)
begin atomic
set [Link] = null;
end;
Database Management Systems Partha Pratim Das 15.27
Trigger to Maintain credits earned value
Module 15
Partha Pratim
create trigger credits earned after update of grade on (takes)
Das referencing new row as nrow
Objectives & referencing old row as orow
Outline
for each row
Functions and
Procedural when [Link] <>’F’ and [Link] is not null
Constructs
Triggers
and ([Link] = ’F’ or [Link] is null)
Triggers : begin atomic
Functionality vs
Performance update student
Module Summary
set tot cred= tot cred +
(select credits
from course
where [Link] id=[Link] id)
where [Link] = [Link];
end;
Module 15
Partha Pratim • The optimal use of DML triggers is for short, simple, and easy to maintain write
Das
operations that act largely independent of an applications business logic.
Objectives &
Outline • Typical and recommended uses of triggers include:
Functions and
Procedural
◦ Logging changes to a history table
Constructs ◦ Auditing users and their actions against sensitive tables
Triggers
Triggers :
◦ Adding additional values to a table that may not be available to an application (due
Functionality vs
Performance
to security restrictions or other limitations), such as:
Module Summary . Login/user name
. Time an operation occurs
. Server/database name
◦ Simple validation
Source: SQL Server triggers: The good and the scary
Module 15
Partha Pratim • Triggers are like Lays: Once you pop, you can’t stop
Das
• One of the greatest challenges for architects and developers is to ensure that
Objectives &
Outline ◦ triggers are used only as needed, and
Functions and
Procedural
◦ to not allow them to become a one-size-fits-all solution for any data needs that
Constructs happen to come along
Triggers
Triggers : • Adding triggers is often seen as faster and easier than adding code to an application,
Functionality vs
Performance but the cost of doing so is compounded over time with each added line of code
Module Summary
Source: SQL Server triggers: The good and the scary
Module 15
Module Summary
◦ Iteration occurs
Source: SQL Server triggers: The good and the scary
Module 15
Triggers
Triggers :
Functionality vs
Performance
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 16: Formal Relational Query Languages/1
Relational
Algebra
Select
Project
Union
Difference Partha Pratim Das
Intersection
Cartesian Product
Rename Department of Computer Science and Engineering
Division Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 16
Partha Pratim • SQL Examples have been practiced for basic query structures
Das
• Nested Subquery in SQL
Week Recap
Objectives &
• Data Modification
Outline
Relational
• SQL expressions for Join and Views
Algebra
Select
• Transactions
Project
Union
• Integrity Constraints
Difference
Intersection
• More data types in SQL
Cartesian Product
Rename
• Authorization in SQL
Division
• Functions and Procedures in SQL
Module Summary
• Triggers
Module 16
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
Objectives &
• Tuple Relational Calculus
Outline
◦ Non-Procedural and Predicate Calculus based
Relational
Algebra • Domain Relational Calculus
Select
Project ◦ Non-Procedural and Predicate Calculus based
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Relational Algebra
Module Summary
Module 16
Objectives &
• Six basic operators
Outline
◦ select: σ
Relational
Algebra ◦ project: Π
Select
Project
◦ union: ∪
Union
◦ set difference: −
Difference
Intersection ◦ Cartesian product: x
Cartesian Product
Rename
◦ rename: ρ
Division
• The operators take one or two relations as inputs and produce a new relation as a result
Module Summary
Module 16
• Notation: σp (r )
Partha Pratim
Das • p is called the selection predicate
Week Recap • Defined as:
Objectives &
Outline
σp (r ) = {t|t ∈ r and p(t)}
Relational
Algebra where p is a formula in propositional calculus consisting of
Select
terms connected by : ∧ (and), ∨ (or), ¬ (not)
Project
Union Each terms is one of:
Difference
Intersection
Cartesian Product
Rename < attribute > op < attribute > or < constant >
Division
Module Summary
where op is one of: =, 6=, >, ≥ . < . ≤
• Example of selection:
σdept name = ’Physics’ (instructor )
Database Management Systems Partha Pratim Das 16.8
Project Operation PPD
Module 16
Partha Pratim
• Notation: ΠA1 ,A2 ,...Ak (r)
Das where A1 , A2 are attribute names and r is a relation
Week Recap • The result is defined as the relation of k columns
Objectives &
Outline
obtained by erasing the columns that are not listed
Relational
Algebra
• Duplicate rows removed from result, since relations
Select are sets
Project
Union • Example: To eliminate the dept name attribute of
Difference
Intersection
instructor
Cartesian Product
Rename
ΠID,name,salary (instructor )
Division
Module Summary
Module 16
• Notation: r ∪ s
Partha Pratim
Das • Defined as: r ∪ s = {t|t ∈ r or t ∈ s}
Week Recap • For r ∪ s to be valid.
Objectives &
Outline a) r, s must have the same arity (same number of
Relational attributes)
Algebra
Select
b) The attribute domains must be compatible (ex-
Project
Union
ample: 2nd column of r deals with the same
Difference type of values as does the 2nd column of s)
Intersection
Cartesian Product
c) Example: to find all courses taught in the Fall
Rename 2009 semester, or in the Spring 2010 semester,
Division
Module Summary
or in both
Πcourse id (σsemester =“Fall”∧year =2009 (section)) ∪ Πcourse id (σsemester =“Spring ”∧year =2010 (section))
Module 16
• Notation r − s
Partha Pratim
Das • Defined as: r − s = {t|t ∈ r and t ∈
/ s}
Week Recap • Set differences must be taken between compatible
Objectives &
Outline
relations
Relational ◦ r and s must have the same arity
Algebra
Select
◦ attribute domains of r and s must be compatible
Project
Union • Example: to find all courses taught in the Fall 2009
Difference
Intersection
semester, but not in the Spring 2010 semester
Cartesian Product
Rename
Division
Πcourse id (σsemester =“Fall”∧year =2009 (section))−
Module Summary Πcourse id (σsemester =“Spring ”∧year =2010 (section))
Module 16
Partha Pratim
• Notation: r ∩ s
Das
• Defined as:
Week Recap
Objectives &
Outline
r ∩ s = {t|t ∈ r and t ∈ s}
Relational
Algebra • Assume:
Select
Project ◦ r, s have the same arity
Union
Difference
◦ attributes of r and s are compatible
Intersection
Cartesian Product
• Note: r ∩ s = r - (r - s)
Rename
Division
Module Summary
Module 16
Partha Pratim
Das
• Notation r × s
Week Recap
Relational
Algebra
r × s = {t q|t ∈ r and q ∈ s}
Select
Project
Union
• Assume that attributes of r (R) and s(S) are disjoint.
Difference (That is, R ∩ S = φ)
Intersection
Cartesian Product • If attributes of r(R) and s(S) are not disjoint, then
Rename
Division renaming must be used
Module Summary
Module 16
Partha Pratim • Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
Das
• Allows us to refer to a relation by more than one name.
Week Recap
Objectives &
• Example:
Outline
Relational ρx (E )
Algebra
Select
returns the expression E under the name X
Project
Union • If a relational-algebra expression E has arity n, then
Difference
Intersection
Cartesian Product ρx(A1 ,A2 ,··· ,An ) (E )
Rename
Division
Module Summary
returns the result of expression E under the name X, and with the attributes renamed to
A1 , A2 , . . . , An
.
Database Management Systems Partha Pratim Das 16.14
Division Operation PPD
Module 16
Partha Pratim
• The division operation is applied to two relations
Das
• R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪ Y); that is, let Y be the set of
Week Recap attributes of R that are not attributes of S
Objectives &
Outline
• The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear in R with tR [Y] =
t, and with
Relational
Algebra ◦ tR [X ] = ts for every tuple ts in S.
Select
Project • For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in
Union
combination with every tuple in S
Difference
Intersection • Division is a derived operation and can be expressed in terms of other operations
Cartesian Product
Rename
Division
• r ÷ s ≡ ΠR−S (r ) − ΠR−S ((ΠR−S (r ) × s) − ΠR−S,S (r ))
Module Summary
Module 16
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
Module 16
• Relations r, s:
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename
Division
Module Summary
e.g. A is customer name
B is branch-name
1 and 2 here show two specific branch-names
(Find customers who have an account in all
branches of the bank)
• r ÷ s:
Database Management Systems Partha Pratim Das 16.19
Division Example (5) PPD
Module 16
• Relations r, s:
Partha Pratim
Das
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product
Rename e.g. Students who have taken both “a” and “b”
Division courses, with instructor “1”
Module Summary
(Find students who have taken all courses given
• r ÷ s: by instructor 1)
Source: [Link]/silberslides/Divsion
Database Management Systems Partha Pratim Das 16.20
Module Summary
Module 16
Week Recap
Objectives &
Outline
Relational
Algebra
Select
Project
Union
Difference
Intersection
Cartesian Product Slides used in this presentation are borrowed from [Link] with kind
Rename
Division
permission of the authors.
Module Summary Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Predicate Logic
Module 17: Formal Relational Query Languages/2
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Partha Pratim Das
Algebra and
Calculus
Department of Computer Science and Engineering
Module Summary Indian Institute of Technology, Kharagpur
ppd@[Link]
Module 17
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim • To understand formal calculus-based query language through relational algebra
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module 17
Partha Pratim
Predicate Logic or Predicate Calculus is an extension of Propositional Logic or
Das Boolean Algebra.
Objectives &
Outline
It adds the concept of predicates and quantifiers to better capture the meaning of
Predicate Logic
Tuple Relational
statements that cannot be adequately expressed by propositional logic.
Calculus
Domain
Relational
Tuple Relational Calculus and Domain Relational Calculus are based on Predicate
Calculus
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim • Consider the statement, “x is greater than 3”. It has two parts. The first part, the
Das
variable x, is the subject of the statement. The second part, “is greater than 3”, is the
Objectives &
Outline
predicate. It refers to a property that the subject of the statement can have.
Predicate Logic • The statement “x is greater than 3” can be denoted by P(x) where P denotes the
Tuple Relational
Calculus
predicate “is greater than 3” and x is the variable.
Domain • The predicate P can be considered as a function. It tells the truth value of the
Relational
Calculus statement P(x) at x. Once a value has been assigned to the variable x, the statement
Equivalence of P(x) becomes a proposition and has a truth or false value.
Algebra and
Calculus
• In general, a statement involving n variables x1 , x2 , x3 , · · · , xn can be denoted by
Module Summary
P(x1 , x2 , x3 , · · · , xn ). Here P is also referred to as n-place predicate or a n-ary predicate.
Module 17
Partha Pratim
In predicate logic, predicates are used alongside quantifiers to express the extent to which a
Das predicate is true over a range of elements. Using quantifiers to create such propositions is
Objectives & called quantification. There are two types of quantifiers:
Outline
Predicate Logic
• Universal Quantifier
Tuple Relational
Calculus
• Existential Quantifier
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Universal Quantification: Mathematical statements sometimes assert that a property is
Das true for all the values of a variable in a particular domain, called the domain of discourse
Objectives &
Outline
• Such a statement is expressed using universal quantification.
Predicate Logic • The universal quantification of P(x) for a particular domain is the proposition that
Tuple Relational
Calculus
asserts that P(x) is true for all values of x in this domain
Domain • The domain is very important here since it decides the possible values of x
Relational
Calculus
• Formally, The universal quantification of P(x) is the statement “P(x) for all values of x
Equivalence of
Algebra and in the domain”.
Calculus
Module Summary
• The notation ∀P(x) denotes the universal quantification of P(x). Here ∀ is called the
universal quantifier. ∀P(x) is read as “for all x P(x)”.
• Example: Let P(x) be the statement “x + 2 > x“. What is the truth value of the
statement ∀x P(x)?
Solution: As x + 2 is greater than x for any real number, so P(x) ≡ T for all x or
∀x P(x) ≡ T
Database Management Systems Partha Pratim Das 17.10
Existential Quantifier
Module 17
Partha Pratim
Existential Quantification: Some mathematical statements assert that there is an
Das element with a certain property. Such statements are expressed by existential
Objectives & quantification. Existential quantification can be used to form a proposition that is true if
Outline
and only if P(x) is true for at least one value of x in the domain.
Predicate Logic
Tuple Relational • Formally, the existential quantification of P(x) is the statement ”There exists an
Calculus
element x in the domain such that P(x)”.
Domain
Relational
Calculus
• The notation ∃P(x) denotes the existential quantification of P(x). Here ∃ is called the
Equivalence of existential quantifier. ∃P(x) is read as “There is atleast one such x such that P(x)”
Algebra and
Calculus • Example: Let P(x) be the statement “x > 5”. What is the truth value of the
Module Summary statement ∃xP(x)?
Solution: P(x) is true for all real numbers greater than 5and false for all real numbers
less than 5. So ∃x P(x) ≡ T
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module 17
Partha Pratim
TRC is a non-procedural query language, where each query is of the form
Das
Equivalence of
Algebra and It also uses quantifiers:
Calculus
∃t ∈ r (Q(t)) = “there exists” a tuple in t in relation r such that predicate Q(t) is true.
Module Summary
∀t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.
Module 17
Tuple Relational
d) Implication (⇒) : x ⇒ y , if x if true, then y is true
Calculus x ⇒ y ≡ ¬x ∨ y
Domain
Relational
e) Set of quantifiers:
Calculus
• ∃t ∈ r (Q(t)) ≡ “there exists” a tuple in t in relation r such that predicate Q(t) is true
Equivalence of
Algebra and
• ∀t ∈ r (Q(t)) ≡ Q is true “for all” tuples t in relation r
Calculus
Module Summary
Module 17
Solution:
Partha Pratim
Das
Student {[Link] | Student(t) ∧ [Link] > 21}
Fname Lname Age Course
Objectives & David Sharma 27 DBMS
Outline {[Link] | t ∈ Student ∧ [Link] > 21}
Aaron Lilly 17 JAVA
Predicate Logic
Sahil Khan 19 Python
Tuple Relational
Calculus Sachin Rao 20 DBMS
Varun George 23 JAVA {t | ∃s ∈ Student([Link] > 21 ∧ [Link] = [Link])}
Domain
Relational Simi Verma 22 JAVA
Calculus Fname
Equivalence of David
Algebra and Q.1 Obtain the first name of students whose
Calculus
Varun
age is greater than 21. Simi
Module Summary
Module 17
Consider the relational schema
Partha Pratim
Das
student(rollNo, name, year , courseId)
course(courseId, cname, teacher )
Objectives &
Outline Q.2 Find out the names of all students who have taken the course name ‘DBMS’.
Predicate Logic
Domain
Relational
• {[Link] | s ∈ student ∧ ∃c ∈ course([Link] = [Link] ∧ [Link]= ‘DBMS’ )}
Calculus
Equivalence of Q.3 Find out the names of all students and their rollNo who have taken the course name ‘DBMS’.
Algebra and
Calculus
Module Summary
• {[Link], [Link] | s ∈ student ∧ ∃c ∈ course([Link] = [Link] ∧ [Link] = ‘DBMS’ )}
• {t | ∃s ∈ student ∃c ∈ course([Link] = [Link] ∧ [Link] =‘DBMS’
∧[Link] = [Link] ∧ [Link] = [Link])}
Module 17
Consider the following relations:
Partha Pratim
Das
Flights(flno, from, to, distance, departs, arrives)
Aircraft(aid, aname, cruisingrange)
Objectives & Certified(eid, aid)
Outline
Employees(eid, ename, salary)
Predicate Logic
Tuple Relational Q.4. Find the eids of pilots certified for Boeing aircraft.
Calculus
Domain RA
Relational
Calculus
Πeid (σaname=‘Boeing 0 (Aircraft n
o Certified))
Equivalence of
TRC
Algebra and
Calculus
• {C .eid | C ∈ Certified ∧ ∃A ∈ Aircraft([Link] = C .aid ∧ [Link] = ‘Boeing’)}
Module Summary
• {T | ∃C ∈ Certified∃A ∈ Aircraft([Link] = C .aid ∧ [Link] = ‘Boeing’
∧T .eid = C .eid)}
Module 17
Consider the following relations:
Partha Pratim
Das
Flights(flno, from, to, distance, departs, arrives)
Aircraft(aid, aname, cruisingrange)
Objectives & Certified(eid, aid)
Outline
Employees(eid, ename, salary)
Predicate Logic
Tuple Relational Q.5. Find the names and salaries of certified pilots working on Boeing aircrafts.
Calculus
Domain RA
Relational
Calculus
Πename,salary (σaname=‘Boeing ‘ (Aircraft n
o Certified n
o Employees))
Equivalence of
TRC
Algebra and {P | ∃E ∈ Employees ∃C ∈ Certified ∃A ∈ Aircraft([Link] = C .aid ∧ [Link]=
Calculus
‘Boeing’∧E .eid = C .eid ∧ [Link] = E .ename ∧ [Link] = E .salary )}
Module Summary
Module 17
Consider the following relations:
Partha Pratim
Das
Flights(flno, from, to, distance, departs, arrives)
Aircraft(aid, aname, cruisingrange)
Objectives & Certified(eid, aid)
Outline
Employees(eid, ename, salary)
Predicate Logic
Tuple Relational Q.6 Identify the flights that can be piloted by every pilot whose salary is more than $100,000.
Calculus
(Hint: The pilot must be certified for at least one plane with a sufficiently large cruising range.)
Domain
Relational
Calculus
Equivalence of
• {F .flno | F ∈ Flights ∧ ∃A ∈ Aircraft∃C ∈ Certified∃E ∈ Employees([Link] >
Algebra and F .distance ∧ [Link] = C .aid ∧ E .salary > 100, 000 ∧ E .eid = C .eid)}
Calculus
Module Summary
Module 17
Partha Pratim • It is possible to write tuple calculus expressions that generate infinite relations
Das
• For example,{t | ¬t ∈ r } results in an infinite relation if the domain of any attribute of
Objectives &
Outline relation r is infinite
Predicate Logic
• To guard against the problem, we restrict the set of allowable expressions to safe
Tuple Relational
Calculus expressions
Domain
Relational
• An expression {t | P(t)} in the tuple relational calculus is safe if every component of t
Calculus
appears in one of the relations, tuples, or constants that appear in P.
Equivalence of
Algebra and ◦ NOTE: this is more than just a syntax condition
Calculus
◦ E.g. {t | t[A] = 5 ∨ true} is not safe — it defines an infinite set with attribute
Module Summary
values that do not appear in any relation or tuples or constants in P
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module 17
Partha Pratim • A non-procedural query language equivalent in power to the tuple relational calculus
Das
• Each query is an expression of the form:
Objectives &
Outline
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Partha Pratim
Das
Objectives &
Outline
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Module 17
Predicate Logic
Tuple Relational
Calculus
Domain
Relational
Calculus
Equivalence of
Algebra and
Calculus
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Design Process
Abstraction
Module 18: Entity-Relationship Model/1
Models
Design Approach
ER Model
Attributes
Entity Sets
Partha Pratim Das
Relationship
Cardinality
Constraints Department of Computer Science and Engineering
Weak Entity Sets Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 18
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
ER Model
◦ Attributes
Attributes ◦ Weak Entity Sets
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Design Process
Module Summary
Module 18
Partha Pratim
A Design:
Das
• Satisfies a given (perhaps informal) functional specification
Objectives &
Outline • Conforms to limitations of the target medium
Design Process
Abstraction
• Meets implicit or explicit requirements on performance and resource usage
Models
Design Approach
• Satisfies implicit or explicit design criteria on the form of the artifact
ER Model • Satisfies restrictions on the design process itself, such as its length or cost, or the tools
Attributes
Entity Sets
available for doing the design
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
• Physics • Electrical Circuits
Partha Pratim
Das
◦ Time-Distance Equation ◦ Kirchoff’s Loop Equations
Objectives & ◦ Quantum Mechanics ◦ Time Series Signals and FFT
Outline
◦ Transistor Models
Design Process • Chemistry
◦ Schematic Diagram
Abstraction
Models
◦ Valency-Bond Structures ◦ Interconnect Routing
Design Approach
• Geography • Building & Bridges
ER Model
Attributes ◦ Maps ◦ Drawings – Plan, Elevation, Side view
Entity Sets
Relationship
◦ Projections ◦ Finite Element Models
Cardinality
Constraints
Weak Entity Sets • Models are common in all engineering disciplines
Module Summary • Model building follows principles of decomposition, abstraction, and hierarchy
• Each model describes a specific aspect of the system
• Build new models upon old proven models
Module 18 • Requirement Analysis: Analyse the data needs of the prospective database users
Partha Pratim
Das
◦ Planning
◦ System Definition
Objectives &
Outline • Database Designing: Use a modeling framework to create abstraction of the real world
Design Process
◦ Logical Model
Abstraction
Models ◦ Physical Model
Design Approach
• Implementation
ER Model
Attributes ◦ Data Conversion and Loading
Entity Sets
Relationship
◦ Testing
Cardinality
Constraints
Weak Entity Sets
Module Summary
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Module Summary
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets Entity Relationship (ER) Model
Module Summary
Module 18
Partha Pratim • The ER data model was developed to facilitate database design by allowing specification
Das
of an enterprise schema that represents the overall logical structure of a database
Objectives &
Outline • The ER model is useful in mapping the meanings and interactions of real-world
Design Process enterprises onto a conceptual schema
Abstraction
Models • The ER data model employs three basic concepts:
Design Approach
ER Model
◦ Attributes
Attributes ◦ Entity sets
Entity Sets
Relationship ◦ Relationship sets
Cardinality
Constraints • The ER model also has an associated diagrammatic representation, the ER diagram,
Weak Entity Sets
which can express the overall logical structure of a database graphically
Module Summary
Module 18
Partha Pratim • An Attribute is a property associated with and entity / entity set. Based on the values
Das
of certain attributes, an entity can be identified uniquely
Objectives &
Outline • Attribute types:
Design Process ◦ Simple and Composite attributes
Abstraction
Models ◦ Single-valued and Multivalued attributes
Design Approach
ER Model
. Example: Multivalued attribute: phone numbers
Attributes ◦ Derived attributes
Entity Sets
Relationship . Can be computed from other attributes
Cardinality
Constraints . Example: age, given date of birth
Weak Entity Sets
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Partha Pratim • An entity is an object that exists and is distinguishable from other objects.
Das
◦ Example: specific person, company, event, plant
Objectives &
Outline • An entity set is a set of entities of the same type that share the same properties.
Design Process
Abstraction
◦ Example: set of all persons, companies, trees, holidays
Models
Design Approach
• An entity is represented by a set of attributes; i.e., descriptive properties possessed by
ER Model all members of an entity set.
Attributes
Entity Sets
◦ Example:
Relationship
instructor = (ID, name, street, city, salary )
Cardinality
Constraints course= (course id, title, credits)
Weak Entity Sets
Module Summary • A subset of the attributes form a primary key of the entity set; that is, uniquely
identifying each member of the set.
◦ Primary key of an entity set is represented by underlining it
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Module Summary
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
ER Model
◦ relationship proj guide is a ternary relationship between instructor, student, and
Attributes project
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
ER Model
• The attribute dept name appears in both entity sets. Since it is the primary key for the
Attributes entity set department, it replicates information present in the relationship and is
Entity Sets
Relationship therefore redundant in the entity set instructor and needs to be removed
Cardinality
Constraints • BUT: When converting back to tables, in some cases the attribute gets reintroduced, as
Weak Entity Sets
we will see later
Module Summary
Module 18
Partha Pratim • Express the number of entities to which another entity can be associated via a
Das
relationship set.
Objectives &
Outline • Most useful in describing binary relationship sets.
Design Process
Abstraction
• For a binary relationship set the mapping cardinality must be one of the following types:
Models
Design Approach
◦ One to one
ER Model
◦ One to many
Attributes ◦ Many to one
Entity Sets
Relationship ◦ Many to many
Cardinality
Constraints
Weak Entity Sets
Module Summary
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Note: Some elements in A and B may not be mapped to any elements in the other set
Database Management Systems Partha Pratim Das 18.24
Mapping Cardinalities
Module 18
Partha Pratim
Das
Objectives &
Outline
Design Process
Abstraction
Models
Design Approach
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Note: Some elements in A and B may not be mapped to any elements in the other set
Module 18
Partha Pratim
An entity set may be of two types:
Das
• Strong entity set
Objectives &
Outline ◦ A strong entity set is an entity set that contains sufficient attributes to uniquely
Design Process identify all its entities
Abstraction
Models
◦ In other words, a primary key exists for a strong entity set
Design Approach ◦ Primary key of a strong entity set is represented by underlining it
ER Model
Attributes
• Weak entity set
Entity Sets
Relationship
◦ A weak entity set is an entity set that does not contain sufficient attributes to
Cardinality
Constraints
uniquely identify its entities
Weak Entity Sets ◦ In other words, a primary key does not exist for a weak entity set
Module Summary
◦ However, it contains a partial key called as a discriminator
◦ Discriminator can identify a group of entities from the entity set
◦ Discriminator is represented by underlining with a dashed line
Module 18
Partha Pratim • Since a weak entity set does not have primary key, it cannot independently exist in the
Das
ER Model
Objectives &
Outline • It features in the model in relationship with a strong entity set. This is called the
Design Process identifying relationship
Abstraction
Models • Primary Key of Weak Entity Set
Design Approach
ER Model
◦ The combination of discriminator and primary key of the strong entity set makes it
Attributes possible to uniquely identify all entities of the weak entity set
Entity Sets
Relationship ◦ Thus, this combination serves as a primary key for the weak entity set.
Cardinality
Constraints
◦ Clearly, this primary key is not formed by the weak entity set completely.
Weak Entity Sets
◦ Primary Key of Weak Entity Set = Its own discriminator + Primary Key of
Module Summary
Strong Entity Set
• Weak entity set must have total participation in the identifying relationship. That is
all its entities must feature in the relationship
Module 18
• Strong Entity Set: Building(building no, building name, address). building no is its
Partha Pratim
Das
primary key
Objectives &
• Weak Entity Set: Apartment(door no, floor). door no is its discriminator as door no
Outline
alone can not identify an apartment uniquely. There may be several other buildings
Design Process
Abstraction
having the same door number
Models
Design Approach
• Relationship: BA between Building and Apartment
ER Model • By total participation in BA, each apartment must be present in at least one building
Attributes
Entity Sets • In contrast, Building has partial participation in BA only as there might exist some
Relationship
Cardinality buildings which has no apartment
Constraints
Weak Entity Sets • Primary Key: To uniquely identify any apartment
Module Summary
◦ First, building no is required to identify the particular building
◦ Second, door no of the apartment is required to uniquely identify the apartment
• Primary key of Apartment = Primary key of Building + Its own discriminator
= building no + door no
Database Management Systems Partha Pratim Das 18.28
Weak Entity Sets (4): Example
Module 18
Partha Pratim
Das • Consider a section entity,
which is uniquely identified
Objectives &
Outline
by a course id, semester,
year, and sec id.
Design Process
Abstraction • Clearly, section entities are
Models
Design Approach
related to course entities.
Suppose we create a rela-
ER Model
Attributes
tionship set sec course be-
Entity Sets tween entity sets section and
Relationship course.
Cardinality
Constraints
Weak Entity Sets
• Note that the information
in sec course is redundant,
Module Summary
since section already has an
attribute course id, which
identifies the course with
which the section is related.
Module 18
ER Model
Attributes
Entity Sets
Relationship
Cardinality
Constraints
Weak Entity Sets
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
ER Diagram
Entity Sets
Module 19: Entity-Relationship Model/2
Relationship Sets
Cardinality
Constraints
Participation
Bounds
Partha Pratim Das
ER Model to
Relational
Schema
Department of Computer Science and Engineering
Entity Sets
Indian Institute of Technology, Kharagpur
Relationship
Composite Attributes
Multivalued
ppd@[Link]
Attributes
Redundancy
Module Summary
Module 19
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim
Das
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
ER Diagram
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim
Das
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • Entity sets of a relationship need not be distinct Each occurrence of an entity set plays
Das
a “role” in the relationship
Objectives &
Outline • The labels “course id” and “prereq id” are called roles.
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • We express cardinality constraints by drawing either a directed line (→), signifying
Das
“one,” or an undirected line (—), signifying “many,” between the relationship set and
Objectives &
Outline
the entity set.
ER Diagram • One-to-one relationship between an instructor and a student :
Entity Sets
Relationship Sets ◦ A student is associated with at most one instructor via the relationship advisor
Cardinality
Constraints ◦ An instructor is associated with at most one student via the relationship advisor
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • An instructor is associated with several (possibly 0) students via advisor
Das
• A student is associated with several (possibly 0) instructors via advisor
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • Total participation (indicated by double line): every entity in the entity set participates
Das
in at least one relationship in the relationship set
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational ◦ participation of student in advisor relation is total
Schema
Entity Sets . every student must have an associated instructor
Relationship
Composite Attributes • Partial participation: some entities may not participate in any relationship in the
Multivalued
Attributes relationship set
Redundancy
Module Summary
◦ Example: participation of instructor in advisor is partial
Module 19
Partha Pratim • A line may have an associated minimum and maximum cardinality, shown in the form
Das
l..h, where l is the minimum and h the maximum cardinality
Objectives &
Outline ◦ A minimum value of 1 indicates total participation.
ER Diagram ◦ A maximum value of 1 indicates that the entity participates in at most one
Entity Sets
Relationship Sets
relationship
Cardinality
Constraints
◦ A maximum value of * indicates no limit.
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Instructor can advise 0 or more students.
Module Summary
A student must have 1 advisor; cannot have multiple advisors
Module 19
Partha Pratim
Das
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • In ER diagrams, a weak entity set is depicted via a double rectangle
Das
• We underline the discriminator of a weak entity set with a dashed line
Objectives &
Outline • The relationship set connecting the weak entity set to the identifying strong entity set
ER Diagram
Entity Sets
is depicted by a double diamond
Relationship Sets
Cardinality
• Primary key for section – (course id, sec id, semester, year)
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim
Das
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim
Das
Objectives &
Outline
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
ER Model to Relational Schema
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • Entity sets and relationship sets can be expressed uniformly as relation schemas that
Das
represent the contents of the database
Objectives &
Outline • A database which conforms to an ER diagram can be represented by a collection of
ER Diagram schemas
Entity Sets
Relationship Sets • For each entity set and relationship set there is a unique schema that is assigned the
Cardinality
Constraints name of the corresponding entity set or relationship set
Participation
Bounds • Each schema has a number of columns (generally corresponding to attributes), which
ER Model to
Relational
have unique names
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • A strong entity set reduces to a schema with the same attributes
Das
Objectives &
student(ID, name, tot cred)
Outline
ER Diagram
• A weak entity set becomes a table that includes a column for the primary key of the
Entity Sets
Relationship Sets identifying strong entity set
Cardinality
Constraints
Participation section (course id, sec id, sem, year )
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • A many-to-many relationship set is represented as a schema with attributes for the
Das
primary keys of the two participating entity sets, and any descriptive attributes of the
Objectives &
Outline
relationship set.
ER Diagram • Example: schema for relationship set advisor
Entity Sets
Relationship Sets
Cardinality
advisor = (s id, i id)
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
• Composite attributes are flattened out by creating a separate attribute
Partha Pratim
Das for each component attribute
Objectives &
◦ Example: given entity set instructor with composite attribute
Outline
name with component attributes first name and last name
ER Diagram
Entity Sets
the schema corresponding to the entity set has two attributes
Relationship Sets name first name and name last name
Cardinality
Constraints
Participation
. Prefix omitted if there is no ambiguity (name first name could
Bounds be first name)
ER Model to
Relational • Ignoring multivalued attributes, extended instructor schema is
Schema
Entity Sets ◦ instructor(ID, first name, middle initial, last name,
Relationship
Composite Attributes
street number, street name, apt number, city, state,
Multivalued
Attributes
zip code, date of birth)
Redundancy
Module Summary
Module 19
Module Summary
Module 19
Partha Pratim • Many-to-one and one-to-many relationship sets that are total on the many-side can be
Das
represented by adding an extra attribute to the “many” side, containing the primary key
Objectives &
Outline
of the “one” side
ER Diagram • Example: Instead of creating a schema for relationship set inst dept, add an attribute
Entity Sets
Relationship Sets
dept name to the schema arising from entity set instructor
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • For one-to-one relationship sets, either side can be chosen to act as the “many” side
Das
◦ That is, an extra attribute can be added to either of the tables corresponding to the
Objectives &
Outline two entity sets
ER Diagram • If participation is partial on the “many” side, replacing a schema by an extra attribute
Entity Sets
Relationship Sets in the schema corresponding to the “many” side could result in null values
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
Partha Pratim • The schema corresponding to a relationship set linking a weak entity set to its
Das
identifying strong entity set is redundant.
Objectives &
Outline • Example: The section schema already contains the attributes that would appear in the
ER Diagram sec course schema
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Composite Attributes
Multivalued
Attributes
Redundancy
Module Summary
Module 19
ER Diagram
Entity Sets
Relationship Sets
Cardinality
Constraints
Participation
Bounds
ER Model to
Relational
Schema
Entity Sets
Relationship
Slides used in this presentation are borrowed from [Link] with kind
Composite Attributes permission of the authors.
Multivalued
Attributes Edited and new slides are marked with “PPD”.
Redundancy
Module Summary
Partha Pratim
Das
Objectives &
Outline Database Management Systems
ER Features
Non-binary
Module 20: Entity-Relationship Model/3
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation Partha Pratim Das
Design Issues
Entities vs Attributes
Department of Computer Science and Engineering
Entities vs
Relationship Indian Institute of Technology, Kharagpur
Binary vs Non-Binary
Design Decisions ppd@[Link]
ER Notation
Module Summary
Module 20
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Extended ER Features
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim • We allow at most one arrow out of a ternary (or greater degree) relationship to indicate
Das
a cardinality constraint
Objectives &
Outline • For example, an arrow from proj guide to instructor indicates each student has at most
ER Features one guide for a project
Non-binary
Relationship
Specialization
• If there is more than one arrow, there are two ways of defining the meaning.
Specialization as
Schema
◦ For example, a ternary relationship R between A, B and C with arrows to B and C
Generalization could mean
Aggregation
Design Issues a) Each A entity is associated with a unique entity from B and C or
Entities vs Attributes
b) Each pair of entities from (A, B) is associated with a unique C entity, and each
Entities vs
Relationship pair (A ,C ) is associated with a unique B
Binary vs Non-Binary
Design Decisions ◦ Each alternative has been used in different formalisms
ER Notation
Module Summary
◦ To avoid confusion we outlaw more than one arrow
Module 20
Partha Pratim • Top-down design process: We designate sub-groupings within an entity set that are
Das
distinctive from other entities in the set
Objectives &
Outline • These sub-groupings become lower-level entity sets that have attributes or participate
ER Features in relationships that do not apply to the higher-level entity set
Non-binary
Relationship
Specialization
• Depicted by a triangle component labeled ISA (e.g., instructor “is a” person)
Specialization as
Schema • Attribute inheritance: A lower-level entity set inherits all the attributes and
Generalization
Aggregation
relationship participation of the higher-level entity set to which it is linked
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Design Issues
Entities vs Attributes ◦ Drawback: Getting information about, an employee requires accessing two relations,
Entities vs
Relationship the one corresponding to the low-level schema and the one corresponding to the
Binary vs Non-Binary
Design Decisions high-level schema
ER Notation
Module Summary
Module 20
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization ◦ Drawback: name, street and city may be stored redundantly for people who are
Aggregation
both students and employees
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim • Bottom-up design process: Combine a number of entity sets that share the same
Das
features into a higher-level entity set
Objectives &
Outline • Specialization and generalization are simple inversions of each other; they are
ER Features represented in an ER diagram in the same way
Non-binary
Relationship
Specialization
• The terms specialization and generalization are used interchangeably
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim
• Completeness constraint: Specifies whether or not an entity in the higher-level entity
Das
set must belong to at least one of the lower-level entity sets within a generalization
Objectives & ◦ total: an entity must belong to one of the lower-level entity sets
Outline
◦ partial: an entity need not belong to one of the lower-level entity sets
ER Features
Non-binary • Partial generalization is the default. We can specify total generalization in an ER
Relationship
Specialization diagram by adding the keyword total in the diagram and drawing a dashed line from
Specialization as
Schema the keyword to the corresponding hollow arrow-head to which it applies (for a total
Generalization
Aggregation
generalization), or to the set of hollow arrow-heads to which it applies (for an
Design Issues
overlapping generalization).
Entities vs Attributes
Entities vs
Relationship
• The student generalization is total. All student entities must
Binary vs Non-Binary be either graduate or undergraduate. Because the higher-
Design Decisions
ER Notation level entity set arrived at through generalization is generally
Module Summary composed of only those entities in the lower-level entity sets,
the completeness constraint for a generalized higher-level en-
tity set is usually total.
Database Management Systems Partha Pratim Das 20.13
Aggregation
Module 20 • Consider the ternary relationship proj guide, which we saw earlier
Partha Pratim
Das
• Suppose we want to record evaluations of a student by a guide on a project
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
• Relationship sets eval for and proj guide represent overlapping information
Partha Pratim
Das ◦ Every eval for relationship corresponds to a proj guide relationship
Objectives &
◦ However, some proj guide relationships may not correspond to any eval for
Outline
relationships
ER Features
Non-binary . So we cannot discard the proj guide relationship
Relationship
Specialization • Eliminate this redundancy via aggregation
Specialization as
Schema
Generalization
◦ Treat relationship as an abstract entity
Aggregation ◦ Allows relationships between relationships
Design Issues ◦ Abstraction of relationship into new entity
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
• Eliminate this redundancy via aggregation without introducing redundancy, the
Partha Pratim
Das
following diagram represents:
◦ A student is guided by a particular instructor on a particular project
Objectives &
Outline ◦ A student, instructor, project combination may have an associated evaluation
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Module Summary
Module 20
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Design Issues
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
• Use of entity sets vs. attributes
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization • Use of phone as an entity allows extra information about phone numbers (plus multiple
Aggregation
phone numbers)
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
• Use of entity sets vs. relationship sets
Partha Pratim
Das Possible guideline is to designate a relationship set to describe an action that occurs
Objectives &
between entities
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
• Placement of relationship attributes
ER Notation For example, attribute date as attribute of advisor or as attribute of student
Module Summary
Module 20
• Although it is possible to replace any non-binary (n-ary, for n > 2) relationship set by a
Partha Pratim
Das number of distinct binary relationship sets, a n-ary relationship set shows more clearly
Objectives &
that several entities participate in a single relationship
Outline
• Some relationships that appear to be non-binary may be better represented using binary
ER Features
Non-binary
relationships
Relationship
Specialization ◦ For example, a ternary relationship parents, relating a child to his/her father and
Specialization as
Schema mother, is best replaced by two binary relationships, father and mother
Generalization
Aggregation . Using two binary relationships allows partial information (e.g., only mother being
Design Issues known)
Entities vs Attributes
Entities vs ◦ But there are some relationships that are naturally non-binary
Relationship
Binary vs Non-Binary . Example: proj guide
Design Decisions
ER Notation
Module Summary
Module 20 • In general, any non-binary relationship can be represented using binary relationships by
Partha Pratim
Das
creating an artificial entity set.
◦ Replace R between entity sets A, B and C by an entity set E, and three relationship
Objectives &
Outline sets:
ER Features 1. RA , relating E and A
Non-binary
Relationship 2. RB , relating E and B
Specialization
Specialization as
3. RC , relating E and C
Schema
Generalization
◦ Create an identifying attribute for E and add any attributes of R to E
Aggregation ◦ For each relationship (ai , bi , ci ) in R, create
Design Issues a) a new entity ei in the entity set E
Entities vs Attributes
Entities vs b) add (ei , ai ) to RA
Relationship
Binary vs Non-Binary c) add (ei , bi ) to RB
Design Decisions
ER Notation
d) add (ei , ci ) to RC
Module Summary
Module 20
• Also need to translate constraints
Partha Pratim
Das ◦ Translating all constraints may not be possible
Objectives &
◦ There may be instances in the translated schema that cannot correspond to any
Outline
instance of R.
ER Features
Non-binary . Exercise: add constraints to the relationships RA , RB and RC to ensure that a
Relationship
Specialization newly created entity corresponds to exactly one entity in each of entity sets —A,
Specialization as
Schema
B and C
Generalization
Aggregation
◦ We can avoid creating an identifying attribute by making E, a weak entity set
Design Issues (described shortly) identified by the three relationship sets
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
• The use of an attribute or entity set to represent an object
Partha Pratim
Das • Whether a real-world concept is best expressed by an entity set or a relationship set
Objectives &
Outline
• The use of a ternary relationship versus a pair of binary relationships
ER Features • The use of a strong or weak entity set
Non-binary
Relationship • The use of specialization/generalization – contributes to modularity in the design
Specialization
Specialization as
Schema
• The use of aggregation – can treat the aggregate entity set as a single unit without
Generalization concern for the details of its internal structure
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
Chen IDE1FX (Crows feet notation)
Partha Pratim
Das
Objectives &
Outline
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Module 20
ER Features
Non-binary
Relationship
Specialization
Specialization as
Schema
Generalization
Aggregation
Design Issues
Entities vs Attributes
Entities vs
Relationship
Binary vs Non-Binary
Design Decisions
ER Notation
Module Summary
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 41: Indexing and Hashing/1: Indexing/1
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files Partha Pratim Das
Primary and
Secondary Indices
Multilevel Index
Index Update
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 41
• Need for algorithm analysis, Asymptotic complexity, and Worst-case, average-case and
Partha Pratim
Das
best-case analysis
Week Recap
• Reviewed Linear Data Structures; array, list, stack, queue; and linear and binary search
Objectives & • Reviewed Non-linear Data Structures - graph, tree, hash table; Binary Search Tree; and
Outline
Indexing
compared Linear and Non-Linear Data Structures
Metrics
• Understood the range of Physical Storage Media
Ordered Indices
Dense Index Files • Studied about Magnetic Disks and Magnetic Tape
Sparse Index Files
Primary and
Secondary Indices
• Glimpsed through Other Storage and the Future of Storage
Multilevel Index
Index Update
• Familiarized with the organization for database files
Module Summary • Understood how records and relations are organized in files
• Learnt how databases keep their own information in Data-Dictionary Storage – the
metadata database of a database
• Understood the mechanisms for fast access of a database store
Module 41
Partha Pratim • To understand the reasons for which we need to index database table
Das
• To learn about the ordered indexes and Indexed Sequential Access Mechanism
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Concepts of Indexing
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index • How to search on Name?
Index Update
◦ Get the phone number for ‘Pabitra Mitra’
Module Summary
◦ Use “Name” Index – sorted on ‘Name’, search ‘Pabitra Mitra’ and navigate on pointer (rec #)
• How to search on Phone?
◦ Get the name of the faculty having phone number = 84772
◦ Use “Phone” Index – sorted on ‘Phone’, search ‘84772’ and navigate on pointer (rec #)
• We can keep the records sorted on ‘Name’ or on ‘Phone’ (called the primary index), but not on both
Database Management Systems Partha Pratim Das 41.6
Basic Concepts
Module 41
Objectives &
. Name in a faculty table
Outline . author catalog in library
Indexing
Metrics
• Search Key - attribute to set of attributes used to look up records in a file
Ordered Indices • An index file consists of records (called index entries) of the form
Dense Index Files
Sparse Index Files
Primary and search-key pointer
Secondary Indices
Multilevel Index
Index Update • Index files are typically much smaller than the original file
Module Summary
• Two basic kinds of indices:
◦ Ordered indices: search keys are stored in sorted order
◦ Hash indices: search keys are distributed uniformly across buckets using a hash
function
Module 41
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Ordered Indices
Module 41
Partha Pratim • In an ordered index, index entries are stored sorted on the search key value. For
Das
example, author catalog in library
Week Recap
• Primary index: in a sequentially ordered file, the index whose search key specifies the
Objectives &
Outline sequential order of the file
Indexing
Metrics
◦ Also called clustering index
Ordered Indices
◦ The search key of a primary index is usually but not necessarily the primary key
Dense Index Files
Sparse Index Files
• Secondary index: an index whose search key specifies an order different from the
Primary and
Secondary Indices
sequential order of the file
Multilevel Index
Index Update
◦ Also called non-clustering index
Module Summary • Index-sequential file: ordered sequential file with a primary index
Module 41
Partha Pratim • Dense index — Index record appears for every search-key value in the file.
Das
• For example, index on ID attribute of instructor relation
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • Dense index on dept name, with instructor file sorted on dept name
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • Sparse Index: contains index records for only some search-key values.
Das
◦ Applicable when records are sequentially ordered on search-key
Week Recap
Objectives &
• To locate a record with search-key value K we:
Outline
◦ Find index record with largest search-key value < K
Indexing
Metrics
◦ Search file sequentially starting at the record to which the index record points
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Secondary index on salary field of instructor
• Index record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
• Secondary indices have to be dense
Database Management Systems Partha Pratim Das 41.15
Primary and Secondary Indices
Module 41
Partha Pratim • Indices offer substantial benefits when searching for records
Das
• BUT: Updating indices imposes overhead on database modification –when a file is
Week Recap
modified, every index on the file must be updated
Objectives &
Outline
• Sequential scan using primary index is efficient, but a sequential scan using a secondary
Indexing
Metrics
index is expensive
Ordered Indices ◦ Each record access may fetch a new block from disk
Dense Index Files
Sparse Index Files
◦ Block fetch requires about 5 to 10 milliseconds, versus about 100 nanoseconds for
Primary and
Secondary Indices
memory access
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim • If primary index does not fit in memory, access becomes expensive
Das
• Solution: treat primary index kept on disk as a sequential file and construct a sparse
Week Recap
index on it
Objectives &
Outline ◦ outer index – a sparse index of primary index
Indexing
Metrics
◦ inner index – the primary index file
Ordered Indices • If even outer index is too large to fit in main memory, yet another level of index can be
Dense Index Files
Sparse Index Files
created, and so on
Primary and
Secondary Indices • Indices at all levels must be updated on insertion or deletion from the file
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
Week Recap
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Partha Pratim
Das
• If deleted record was the only
Week Recap record in the file with its partic-
Objectives &
Outline
ular search-key value, the search-
Indexing
key is deleted from the index also.
Metrics
Ordered Indices
Dense Index Files • Single-level index entry deletion:
Sparse Index Files
Primary and
Secondary Indices
◦ Dense indices – deletion of search-key is similar to file record deletion
Multilevel Index ◦ Sparse indices –
Index Update
Module Summary
. If an entry for the search key exists in the index, it is deleted by replacing the
entry in the index with the next search-key value in the file (in search-key order)
. If the next search-key value already has an index entry, the entry is deleted
instead of being replaced
Module 41
Ordered Indices
. If a new block is created, the first search-key value appearing in the new block is
Dense Index Files inserted into the index
Sparse Index Files
Primary and
Secondary Indices
• Multilevel insertion and deletion: algorithms are simple extensions of the single-level
Multilevel Index algorithms
Index Update
Module Summary
Module 41
Partha Pratim • Frequently, one wants to find all the records whose values in a certain field (which is
Das
not the search-key of the primary index) satisfy some condition
Week Recap
◦ Example 1: In the instructor relation stored sequentially by ID, we may want to find
Objectives &
Outline all instructors in a particular department
Indexing ◦ Example 2: as above, but where we want to find all instructors with a specified
Metrics
salary or with salary in a specified range of values
Ordered Indices
Dense Index Files • We can have a secondary index with an index record for each search-key value
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Module 41
Objectives &
Outline
Indexing
Metrics
Ordered Indices
Dense Index Files
Sparse Index Files
Primary and
Secondary Indices
Multilevel Index
Index Update
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Balanced BST
Module 42: Indexing and Hashing/2: Indexing/2
2-3-4 Tree
Search
Insert
Split
Example
Delete
Partha Pratim Das
Observations
ppd@[Link]
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • To recap Balanced Binary Search Trees as options for optimal in-memory search data
Das
structures
Objectives &
Outline • To understand the issues relating to external search data structures for persistent data
Balanced BST
• To study 2-3-4 Tree as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
• How to search a key in a list of n data items?
Partha Pratim
◦ Linear Search: O(n): Find 28 ⇒ 16 comparisons
Das . Unordered items in an array – search sequentially
Objectives &
. Unordered / Ordered items in a list – search sequentially
Outline
Balanced BST
2-3-4 Tree
◦ Binary Search: O(lg n): Find 28 ⇒ 4 comparisons – 25, 36, 30, 28
Search . Ordered items in an array – search by divide-and-conquer
Insert
Split
Example
Delete
. Binary Search Tree – recursively on left / right
Observations
Module Summary
Module 42
Partha Pratim • Worst case time (n data items in the data structure):
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
• Between an array and a list, there is a trade-off between search and insert/delete
complexity
• For a BST of n nodes, lg n ≤ h < n, where h is the height of the tree
• A BST is balanced if h ∼ O(lg n): this what we desire
Module 42
• In the worst case, searching a key in a BST is O(h), where h is the height of the key
Partha Pratim
Das • Bad Tree: h ∼ O(n)
Objectives & ◦ The BST is a skewed binary search tree (all the nodes except the leaf would have
Outline
only one child)
Balanced BST
◦ This can happen if keys are inserted in sorted order
2-3-4 Tree
Search ◦ Height (h) of the BST having n elements becomes n − 1
Insert
Split
◦ Time complexity of search in BST becomes O(n)
Example
Delete
• Good Tree: h ∼ O(lg n)
Observations
◦ The BST is a balanced binary search tree
Module Summary
◦ This is possible if
. If keys are inserted in purely randomized order, Or
. If the tree is explicitly balanced after every insertion
◦ Height (h) of the binary search tree becomes lg n
◦ Time complexity of search in BST becomes O(lg n)
Module 42
• A BST is balanced if h ∼ O(lg n)
Partha Pratim
Das • Balancing Guarantees may be of various types:
Objectives & ◦ Worst-case
Outline
Balanced BST
. AVL Tree: Self-balancing BST
2-3-4 Tree
− Named after inventors Adelson-Velsky-Landis
Search − Heights of the two child subtrees of any node differ by at most one: |hL − hR | ≤ 1
Insert − If they differ by more than one, rebalancing is done rotation
Split
Example ◦ Randomized
Delete
Observations . Randomized BST
Module Summary − A BST on n keys is random if either it is empty (n = 0), or the probability that a given
1
key is at the root is n
, and the left and right subtrees are random
. Skip List
− A skip list is built (probabbilistically) in layers of ordered linked lists
◦ Amortized
. Splay
− A BST where recently accessed elements are quick to access again
Database Management Systems Partha Pratim Das 42.9
Balanced Binary Search Trees (2) PPD
Module 42
• These data structures have optimal complexity for the required operations:
Partha Pratim
Das ◦ Search: O(lg n)
Objectives &
◦ Insert: Search + O(1): O(lg n)
Outline ◦ Delete: Search + O(1): O(lg n)
Balanced BST
• And they are:
2-3-4 Tree
Search ◦ Good for in-memory operations
Insert
Split ◦ Work well for small volume of data
Example
Delete
◦ Has complex rotation and / or similar operations
Observations ◦ Do not scale for external data structures
Module Summary
Module 42
Partha Pratim
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
2-3-4 Tree
Module 42
Partha Pratim • All leaves are at the same depth (the bottom level).
Das
◦ Height, h, of all leaf nodes are same
Objectives &
Outline . h ∼ O(lg n)
Balanced BST . Complexity of search, insert and delete: O(h) ∼ O(lg n)
2-3-4 Tree
Search
• All data is kept in sorted order
Insert
Split • Every node (leaf or internal) is a 2-node, 3-node or a 4-node (based on the number of
Example
Delete
links or children), and holds one, two, or three data elements, respectively
Observations
• Generalizes easily to larger nodes
Module Summary
• Extends to external data structures
Module 42
Partha Pratim • Uses 3 kinds of nodes satisfying key relationships as shown below:
Das
◦ A 2-node must contain a single data item (S) and two links
Objectives &
Outline ◦ A 3-node must contain two data items (S, L) and three links
Balanced BST ◦ A 4-node must contain three data items (S, M, L) and four links
2-3-4 Tree ◦ A leaf may contain either one, two, or three data items
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module 42
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module 42
Partha Pratim • Insert 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das
• 10
Objectives &
Outline • 10, 30
Balanced BST • 10, 30, 60
2-3-4 Tree
Search
• Split for 20
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15
Das
• Split for 90
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90
Das
• Split for 100
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Partha Pratim • 10, 30, 60, 20, 50, 40, 70, 80, 15, 90, 100
Das
Objectives &
Outline
Balanced BST
2-3-4 Tree
Search
Insert
Split
Example
Delete
Observations
Module Summary
Module 42
Module Summary
Module 42
Module 42
• Consider only one node type with space for 3 items and 4 links
Partha Pratim
Das ◦ Internal node (non-root) has 2 to 4 children (links)
Objectives & ◦ Leaf node has 1 to 3 items
Outline
◦ Wastes some space, but has several advantages for external data structure
Balanced BST
Module 42
Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
Balanced BST
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
2-3-4 Tree
Search
structure for database and index tables
Insert
Split
Example
Delete
Observations
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
B+-Tree Index
Files Module 43: Indexing and Hashing/3: Indexing/3
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Partha Pratim Das
Updates
Insertion
Department of Computer Science and Engineering
Deletion
Indian Institute of Technology, Kharagpur
File Organization
Non-Unique Keys
Relocation and
ppd@[Link]
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.1
Module Recap PPD
Module 43
Partha Pratim • Recapitulated the notions of Balanced Binary Search Trees as options for optimal
Das
in-memory search data structures
Objectives &
Outline • Understood the issues relating to external data structures for persistent data
B+-Tree Index
Files
• Explored 2-3-4 Tree in depth as a precursor to B/B+-Tree for an efficient external data
Simple B
+
Tree structure for database and index tables
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.2
Module Objectives PPD
Module 43
Partha Pratim • To understand the design of B+ Tree Index Files as a generalization of 2-3-4 Tree
Das
• To understand the fundamentals of B-Tree Index Files
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.3
Module Outline PPD
Module 43
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.4
B+ Tree Index Files PPD
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.5
B+ Tree
B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.6
B+ Tree (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
• Internal node contains
Observations
Query
◦ At least n2 child pointers, except the root node
Duplicates ◦ At most n pointers Note: These are approximate
Updates
Insertion • Leaf node contains values, we will discuss more
Deletion
File Organization ◦ At least n2 record pointers and n2 key values precise values later in this lecture.
Non-Unique Keys
Relocation and ◦ At most n record pointer and n key values
Secondary Indices
Strings
◦ One block pointer P to point to next leaf node
B-Tree Index
Files
Source: B+ Tree
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.7
B+ Tree (3): Search
Module 43
• Suppose we have to search 55 in the B+ tree below
Partha Pratim
Das ◦ First, we will fetch for the intermediary node which will direct to the leaf node that
Objectives &
can contain a record for 55
Outline
• So, in the intermediary node, we will find a branch between 50 and 75 nodes
B+-Tree Index
Files
+
◦ Then at the end, we will be redirected to the third leaf node
Simple B Tree
Index Files ◦ Here DBMS will perform a sequential search to find 55
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings Source: B+ Tree
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.8
B+ Tree (3): Insert
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • Suppose we want to insert a record 60 that goes to 3rd leaf node after 55
Duplicates
Updates • The leaf node of this tree is already full, so we cannot insert 60 there
Insertion
Deletion • So we have to split the leaf node, so that it can be inserted into tree without affecting
File Organization
Non-Unique Keys
the fill factor, balance and order
• The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50
Relocation and
Secondary Indices
Strings
B-Tree Index
• We will split the leaf node of the tree in the middle so that its balance is not altered
Files
Comparison Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.9
B+ Tree (4): Insert
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query • So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes
Duplicates
Updates • If these two has to be leaf nodes, the intermediate node cannot branch from 50
Insertion
Deletion • It should have 60 added to it, and then we can have pointers to a new leaf node
File Organization
Non-Unique Keys • This is how we can insert an entry when there is overflow. In a normal scenario, it is
Relocation and
Secondary Indices very easy to find the node where it fits and then place it in that leaf node
Strings
Source: B+ Tree
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.10
B+ Tree (5): Delete
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations • To delete 60, we have to remove 60 from intermediate node as well as 4th leaf node
Query
Duplicates
• If we remove it from the intermediate node, then the tree will not remain a B+ tree
Updates
Insertion
Deletion
• So with deleting 60 we re-arranging the nodes:
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Source: B+ Tree
Module Summary
Database Management Systems Partha Pratim Das 43.11
B+ Tree Index Files
Module 43
B-Tree Index
◦ B+ trees are used extensively
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.12
B+ Tree Index Files (2): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.13
B+ Tree Index Files (3): Structure
Module 43
Partha Pratim
A B+ tree is a rooted tree satisfying the following properties:
Das
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between d n2 e and n children
Objectives &
Outline
B+-Tree Index
Files
• A leaf node has between an d n−1
2 e and n − 1 values
+
Simple B
Index Files
Tree
• Special cases:
Nodes
Observations
◦ If the root is not a leaf, it has at least 2 children.
Query ◦ If the root is a leaf (that is, there are no other nodes in the tree), it can have
Duplicates
Updates between 0 and (n − 1) values.
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.14
B+ Tree Index Files (4): Node Structure
Module 43
Objectives &
Outline
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.15
B+ Tree Index Files (5): Leaf Nodes
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.16
B+ Tree Index Files (6): Non-Leaf Nodes
Module 43
Partha Pratim • Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node
Das
with m pointers:
Objectives &
Outline ◦ All the search-keys in the subtree to which P1 points are less than K1
B+-Tree Index ◦ For 2 ≤ i ≤ n − 1, all the search-keys in the subtree to which Pi points have values
Files
Simple B
+
Tree
greater than or equal to Ki−1 and less than Ki
Index Files
Nodes
◦ All the search-keys in the subtree to which Pn points have values greater than or
Observations equal to Kn−1
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.17
B+ Tree Index Files (7): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.18
B+ Tree Index Files: Observations
Module 43
• Since the inter-node connections are done by pointers, logically close blocks need not
Partha Pratim
Das
be physically close
Objectives &
• The non-leaf levels of the B+ tree form a hierarchy of sparse indices
Outline
• The B+ tree contains a relatively small number of levels
B+-Tree Index
◦ Level below root has at least 2 ∗ n2 values
Files
+
Simple B Tree
Index Files ◦ Next level has at least 2 ∗ d n2 e ∗ d n2 e values
Nodes
Observations
◦ ... etc.
Query
Duplicates
◦ If there are K search-key values in the file, the tree height is no more than
Updates dlogdn/2e (K )e
Insertion
Deletion ◦ thus searches can be conducted efficiently
File Organization
Non-Unique Keys • Insertions and deletions to the main file can be handled efficiently, as the index can be
Relocation and
Secondary Indices restructured in logarithmic time
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.19
B+ Tree Index Files: Queries
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.20
B+ Trees Index Files: Queries (2)
Module 43
Partha Pratim • If
l there are mK search-key values in the file, the height of the tree is no more than
Das
logd n e (K )
Objectives & 2
Outline
• A node is generally the same size as a disk block, typically 4 kilobytes
B+-Tree Index
Files
+
◦ and n is typically around 100 (40 bytes per index entry)
Simple B Tree
Index Files • With 1 million search key values and n = 100
Nodes
Observations ◦ at most log50 (1, 000, 000) = 4 nodes are accessed in a lookup
Query
Duplicates • Contrast this with a balanced binary tree with 1 million search key values — around 20
Updates
Insertion nodes are accessed in a lookup
Deletion
File Organization
◦ above difference is significant since every node access may need a disk I/O, costing
Non-Unique Keys
around 20 milliseconds
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.21
B+ Tree Index Files: Handling Duplicates
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.22
B+ Tree Index Files: Handling Duplicates (2)
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.23
Updates on B+ Trees: Insertion
Module 43
Partha Pratim • Find the leaf node in which the search-key value would appear
Das
• If the search-key value is already present in the leaf node
Objectives &
Outline ◦ Add record to the file
B+-Tree Index
Files
◦ If necessary add a pointer to the bucket
+
Simple B
Index Files
Tree
• If the search-key value is not present, then
Nodes
Observations
◦ Add the record to the main file (and create a bucket if necessary)
Query ◦ If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node
Duplicates
Updates
◦ Otherwise, split the node (along with the new (key-value, pointer) entry) as
Insertion
Deletion
discussed in the next slide
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.24
Updates on B+ Trees: Insertion (2)
Module 43
B+-Tree Index ◦ let the new node be p, and let k be the least key value in p. Insert (k, p) in the
Files
Simple B
+
Tree
parent of the node being split
Index Files
Nodes
◦ If the parent is full, split it and propagate the split further up
Observations
Query
• Splitting of nodes proceeds upwards till a node that is not full is found
Duplicates
◦ In the worst case the root node may be split increasing the height of the tree by 1
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Secondary Indices Next step: insert entry with (Califieri,pointer-to-new-node) into parent
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.25
Updates on B+ Trees: Insertion (3)
Module 43
Partha Pratim • Splitting a non-leaf node: when inserting (k, p) into an already full internal node N
Das
◦ Copy N to an in-memory area M with space for n + 1 pointers and n keys
Objectives &
Outline ◦ Insert (k, p) into M
B+-Tree Index ◦ Copy P1 , K1 , · · · , Kd n e−1 , Pd n e from M back into node N
Files 2 2
Simple B
+
Tree ◦ Copy Pd n e+1 , Kd n e+1 , · · · , Kn , Pn+1 from M into newly allocated node N 0
Index Files 2 2
Nodes ◦ Insert (Kd n e , N 0 ) into parent N
Observations 2
Query
Duplicates
• Read pseudocode in book!
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.26
Updates on B+ Trees: Insertion Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files B+ Tree before and after insertion of “Adams”
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.27
Updates on B+ Trees: Insertion Example (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Module Summary
Database Management Systems Partha Pratim Das 43.28
Updates on B+ Trees: Deletion
Module 43
Partha Pratim • Find the record to be deleted, and remove it from the main file and from the bucket (if
Das
present)
Objectives &
Outline • Remove (search-key value, pointer) from the leaf node if there is no bucket or if the
B+-Tree Index bucket has become empty
Files
Simple B
+
Tree • If the node has too few entries due to the removal, and the entries in the node and a
Index Files
Nodes sibling fit into a single node, then merge siblings:
Observations
Query ◦ Insert all the search-key values in the two nodes into a single node (the one on the
Duplicates
Updates
left), and delete the other node.
Insertion ◦ Delete the pair (Ki−1 , Pi ), where Pi is the pointer to the deleted node, from its
Deletion
File Organization
parent, recursively using the above procedure.
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.29
Updates on B+ Trees: Deletion (2)
Module 43
Partha Pratim • Otherwise, if the node has too few entries due to the removal, but the entries in the
Das
node and a sibling do not fit into a single node, then redistribute pointers:
Objectives &
Outline ◦ Redistribute the pointers between the node and a sibling such that both have more
B+-Tree Index than the minimum number of entries
Files
Simple B
+
Tree
◦ Update the corresponding search-key value in the parent of the node
• The node deletions may cascade upwards till a node which has n2 or more pointers is
Index Files
Nodes
Observations
Query
found
Duplicates
Updates
• If the root node has only one pointer after deletion, it is deleted and the sole child
Insertion becomes the root
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.30
Updates on B+ Trees: Deletion Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Before and after deleting “Srinivasan”
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
Deleting “Srinivasan” causes merging of under-full leaves
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.31
Updates on B+ Trees: Deletion Example (2)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Deletion of “Singh” and “Wu” from result of previous example
Observations
Query
Duplicates • Leaf containing Singh and Wu became underfull, and borrowed a value Kim from its
Updates
Insertion left sibling
Deletion
File Organization • Search-key value in the parent changes as a result
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.32
Updates on B+ Trees: Deletion Example (3)
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization Before and after deletion of “Gold” from earlier example
Non-Unique Keys
Relocation and
Secondary Indices • Node with “Gold” and “Katz” became underfull, and was merged with its sibling
Strings
B-Tree Index
• Parent node becomes underfull, and is merged with its sibling
Files
Comparison
◦ Value separating two nodes (at the parent) is pulled down when merging
Module Summary • Root node then has only one child, and is delete
Database Management Systems Partha Pratim Das 43.33
B+ Tree File Organization
Module 43
Partha Pratim • Index file degradation problem is solved by using B+ Tree indices
Das
• Data file degradation problem is solved by using B+ Tree File Organization
Objectives &
Outline • The leaf nodes in a B+ tree file organization store records, instead of pointers
B+-Tree Index
Files • Leaf nodes are still required to be half full
+
Simple B Tree
Index Files ◦ Since records are larger than pointers, the maximum number of records that can be
Nodes
Observations
stored in a leaf node is less than the number of pointers in a non-leaf node
Query
Duplicates
• Insertion and deletion are handled in the same way as insertion and deletion of entries
Updates in a B+ tree index
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.34
B+ Tree File Organization: Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Example of B+ tree File Organization
Insertion
Deletion
File Organization • Good space utilization important since records use more space than pointers.
Non-Unique Keys
Relocation and
Secondary Indices
• To improve space utilization, involve more sibling nodes in redistribution during splits
Strings and merges
B-Tree Index
Files
◦ Involving 2 siblings in redistribution
(to avoid split / merge where possible) results
Comparison in each node having at least 2n 3 entries
Module Summary
Database Management Systems Partha Pratim Das 43.35
Non-Unique Search Keys
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.36
Record Relocation and Secondary Indices
Module 43
Partha Pratim • If a record moves, all secondary indices that store record pointers have to be updated
Das
• Node splits in B+ tree file organizations become very expensive
Objectives &
Outline • Solution: Use primary-index search key instead of record pointer in secondary index
B+-Tree Index
Files ◦ Extra traversal of primary index to locate record
+
Simple B
Index Files
Tree
– Higher cost for queries, but node splits are cheap
Nodes ◦ Add record-id if primary-index search key is non-unique
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.37
Indexing Strings
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.38
B-Tree Index Files PPD
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
B-Tree Index Files
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.39
B-Tree Index Files
Module 43
Partha Pratim • Similar to B+ tree, but B-tree allows search-key values to appear only once; eliminates
Das
redundant storage of search keys
Objectives &
Outline • Search keys in non-leaf nodes appear nowhere else in the B-tree; an additional pointer
B+-Tree Index field for each search key in a non-leaf node must be included
Files
Simple B
+
Tree • Generalized B-tree leaf node
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings • Non-leaf node - pointers Bi are the bucket or file record pointers
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.40
B-Tree Index File (2): Example
Module 43
Partha Pratim
Das
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
B-tree (above) and B+ tree (below) on same data
Updates
Insertion
Deletion
File Organization
Non-Unique Keys
Relocation and
Secondary Indices
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.41
Comparison of B-Tree and B+ Tree Index Files
Module 43
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.42
Module Summary
Module 43
Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline
B+-Tree Index
Files
+
Simple B Tree
Index Files
Nodes
Observations
Query
Duplicates
Updates
Insertion
Deletion
File Organization
Slides used in this presentation are borrowed from [Link] with kind
Non-Unique Keys permission of the authors.
Relocation and
Secondary Indices Edited and new slides are marked with “PPD”.
Strings
B-Tree Index
Files
Comparison
Module Summary
Database Management Systems Partha Pratim Das 43.43
Module 44
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Static Hashing
Hash Function
Module 44: Indexing and Hashing/4: Hashing
Example
Bucket Overflow
Dynamic Hashing
Example
Partha Pratim Das
Comparison
Schemes
ppd@[Link]
Module 44
Partha Pratim • Understood the design of B+ Tree Index Files in depth for database persistent store
Das
• Familiarized with B-Tree Index Files
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • To explore various hashing schemes – Static and Dynamic Hashing
Das
• To compare Ordered Indexing and Hashing
Objectives &
Outline • To understand the Bitmap Indices
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Static Hashing
Module 44 • A hash function h maps data of arbitrary size (from domain D) to fixed-size values (say,
Partha Pratim
integers from 0 to N > 0 h : D → [0..N]
Das
• Given key k, h(k) is called hash values, hash codes, digests, or simply hashes
Objectives &
Outline • If for two keys k1 6= k2 , we have h(k1 ) = h(k2 ), we say a collision has occured
Static Hashing
Hash Function
• A hash function should be Collision Free and Fast
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • A bucket is a unit of storage containing one or more records (a bucket is typically a
Das
disk block)
Objectives &
Outline • In a hash file organization we obtain the bucket of a record directly from its
Static Hashing search-key value using a hash function
Hash Function
Example • Hash function h is a function from the set of all search-key values K to the set of all
Bucket Overflow
bucket addresses B
Dynamic Hashing
Example • Hash function is used to locate records for access, insertion as well as deletion
Comparison
Schemes • Records with different search-key values may be mapped to the same bucket; thus
Bitmap Indices entire bucket has to be searched sequentially to locate a record
Module Summary
Module 44
Partha Pratim
Hash file organization of instructor file, using dept name as key
Das
• There are 10 buckets
Objectives &
Outline • The binary representation of the i th character is assumed to be the integer i
Static Hashing
Hash Function
• The hash function returns the sum of the binary representations of the characters
Example modulo 10
Bucket Overflow
Dynamic Hashing
◦ For example
Example
Comparison
Schemes
h(Music) = 1 h(History) = 2
Bitmap Indices
h(Physics) = 3 h(Elec. Eng.) = 3
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Worst hash function maps all search-key values to the same bucket; this makes access
Das
time proportional to the number of search-key values in the file
Objectives &
Outline • An ideal hash function is uniform, i.e., each bucket is assigned the same number of
Static Hashing search-key values from the set of all possible values
Hash Function
Example • Ideal hash function is random, so each bucket will have the same number of records
Bucket Overflow
assigned to it irrespective of the actual distribution of search-key values in the file
Dynamic Hashing
Example • Typical hash functions perform computation on the internal binary representation of the
Comparison
Schemes
search-key
Bitmap Indices ◦ For example, for a string search-key, the binary representations of all the characters
Module Summary in the string could be added and the sum modulo the number of buckets could be
returned
Module 44
Dynamic Hashing
• Although the probability of bucket overflow can be reduced, it cannot be eliminated
Example ◦ it is handled by using overflow buckets
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44 • Overflow chaining – the overflow buckets of a given bucket are chained together in a
Partha Pratim
linked list
Das
• Above scheme is called closed hashing
Objectives & ◦ An alternative, called open hashing, which does not use overflow buckets, is not
Outline
Static Hashing
suitable for database applications
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Hashing can be used not only for file organization, but also for index-structure creation
Das
• A hash index organizes the search keys, with their associated record pointers, into a
Objectives &
Outline hash file structure
Static Hashing
Hash Function
• Strictly speaking, hash indices are always secondary indices
Example
Bucket Overflow
◦ if the file itself is organized using hashing, a separate primary hash index on it using
Dynamic Hashing
the same search-key is unnecessary
Example ◦ However, we use the term hash index to refer to both secondary index structures
Comparison
Schemes
and hash organized files
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • In static hashing, function h maps search-key values to a fixed set of B of bucket
Das
addresses. Databases grow or shrink with time
Objectives &
Outline ◦ If initial number of buckets is too small, and file grows, performance will degrade
Static Hashing due to too much overflows
Hash Function
Example
◦ If space is allocated for anticipated growth, a significant amount of space will be
Bucket Overflow wasted initially (and buckets will be underfull).
Dynamic Hashing ◦ If database shrinks, again space will be wasted
Example
Comparison • One solution: periodic re-organization of the file with a new hash function
Schemes
Bitmap Indices
◦ Expensive, disrupts normal operations
Module Summary • Better solution: allow the number of buckets to be modified dynamically
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Dynamic Hashing
Module 44
Partha Pratim • Good for database that grows and shrinks in size
Das
• Allows the hash function to be modified dynamically
Objectives &
Outline • Extendable hashing – one form of dynamic hashing
Static Hashing
Hash Function
◦ Hash function generates values over a large range — typically b-bit integers, with
Example
Bucket Overflow
b = 32
Dynamic Hashing
◦ At any time use only a prefix of the hash function to index into a table of bucket
Example addresses
Comparison
Schemes
◦ Let the length of the prefix be i bits, 0 ≤ i ≤ 32
Bitmap Indices . Bucket address table size = 2i . Initially i = 0
Module Summary . Value of i grows and shrinks as the size of the database grows and shrinks
◦ Multiple entries in the bucket address table may point to a bucket (why?)
◦ Thus, actual number of buckets is < 2i
. The number of buckets also changes dynamically due to coalescing and splitting
of buckets
Database Management Systems Partha Pratim Das 44.17
General Extendable Hash Structure PPD
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Dynamic Hashing
follow the pointer to appropriate bucket
Example
• To insert a record with search-key value Kj
Comparison
Schemes ◦ Follow same procedure as look-up and locate the bucket, say j
Bitmap Indices ◦ If there is room in the bucket j insert record in the bucket
Module Summary ◦ Else the bucket must be split and insertion re-attempted (next slide)
. Overflow buckets used instead in some cases (will see shortly)
Module 44
To split a bucket j when inserting record with search-key value Kj
Partha Pratim
Das
• If i > ij (more than one pointer to bucket j)
Objectives & ◦ Allocate a new bucket z, and set ij = iz = (ij + 1)
Outline
Static Hashing
◦ Update the second half of the bucket address table entries originally pointing to j,
Hash Function to point to z
Example
Bucket Overflow
◦ Remove each record in bucket j and reinsert (in j or z)
Dynamic Hashing ◦ Recompute new bucket for Kj and insert record in the bucket (further splitting is
Example
required if the bucket is still full)
Comparison
Schemes • If i = ij (only one pointer to bucket j)
Bitmap Indices ◦ If i reaches some limit b, or too many splits have happened in this insertion, create
Module Summary
an overflow bucket
◦ Else
. Increment i and double the size of the bucket address table
. Replace each entry in the table by two entries that point to the same bucket
. Recompute new bucket address table entry for Kj . Now i > ij so use the first
case
Database Management above
Systems Partha Pratim Das 44.20
Deletion in Extendable Hash Structure
Module 44
Comparison . Note: decreasing bucket address table size is an expensive operation and should
Schemes
be done only if number of buckets becomes much smaller than the size of the
Bitmap Indices
table
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
• Initial Hash structure; bucket size = 2
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert “Mozart”, “Srinivasan”, and “Wu” records
Module 44
Partha Pratim
• Hash structure after insertion of “Mozart”, “Srini-
Das vasan”, and “Wu” records
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert Einstein record
Module 44
• Hash structure after insertion of “Einstein” record
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
• Insert “Gold” and “El Said” records
Module 44
• Hash structure after insertion of “Gold” and “El
Partha Pratim
Das
Said” records
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
• Hash structure after insertion of “Kim” record
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Comparison Schemes
Module 44
Dynamic Hashing
◦ Bucket address table may itself become very big (larger than memory)
Example . Cannot allocate very large contiguous areas on disk either
Comparison
Schemes
. Solution: B+ -tree structure to locate desired record in bucket address table
Bitmap Indices ◦ Changing size of bucket address table is an expensive operation
Module Summary
• Linear hashing is an alternative mechanism
◦ Allows incremental growth of its directory (equivalent to bucket address table)
◦ At the cost of more bucket overflows
Module 44
Comparison
• In practice:
Schemes
◦ PostgreSQL supports hash indices, but discourages use due to poor performance
Bitmap Indices
◦ Oracle supports static hash organization, but not hash indices
Module Summary
◦ SQLServer supports only B+ -trees
Module 44
Partha Pratim
Das
Objectives &
Outline
Static Hashing
Hash Function
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Bitmap Indices
Module 44
Partha Pratim • Bitmap indices are a special type of index designed for efficient querying on multiple
Das
keys
Objectives &
Outline • Records in a relation are assumed to be numbered sequentially from, say, 0
Static Hashing ◦ Given a number n it must be easy to retrieve record n
Hash Function
Example . Particularly easy if records are of fixed size
Bucket Overflow
Dynamic Hashing • Applicable on attributes that take on a relatively small number of distinct values
Example
◦ For example: gender, country, state, . . .
Comparison
Schemes ◦ For example: income-level (income broken up into a small number of levels such as
Bitmap Indices 0-9999, 10000-19999, 20000-50000, 50000- infinity)
Module Summary
• A bitmap is simply an array of bits
Module 44
Partha Pratim • In its simplest form a bitmap index on an attribute has a bitmap for each value of the
Das
attribute
Objectives &
Outline ◦ Bitmap has as many bits as records
Static Hashing ◦ In a bitmap for value v, the bit for a record is 1 if the record has the value v for the
Hash Function
Example
attribute, and is 0 otherwise
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Module 44
Partha Pratim • Bitmap indices are useful for queries on multiple attributes
Das
◦ not particularly useful for single attribute queries
Objectives &
Outline • Queries are answered using bitmap operations
Static Hashing
Hash Function
◦ Intersection (and)
Example ◦ Union (or)
Bucket Overflow
Dynamic Hashing
◦ Complementation (not)
Example
• Each operation takes two bitmaps of the same size and applies the operation on
Comparison
Schemes corresponding bits to get the result bitmap
Bitmap Indices ◦ For example: 100110 AND 110011 = 100010
Module Summary 100110 OR 110011 = 110111
NOT 100110 = 011001
◦ Males with income level L1: 10010 AND 10100 = 10000
. Can then retrieve required tuples
. Counting number of matching tuples is even faster
Database Management Systems Partha Pratim Das 44.36
Bitmap Indices (4)
Module 44
Partha Pratim • Bitmap indices generally very small compared with relation size
Das
◦ For example, if record is 100 bytes, space for a single bitmap is 1/800 of space used
Objectives &
Outline by relation
Static Hashing . If number of distinct attribute values is 8, bitmap is only 1% of relation size
Hash Function
Example • Deletion needs to be handled properly
Bucket Overflow
Dynamic Hashing
◦ Existence bitmap to note if there is a valid record at a record location
Example ◦ Needed for complementation
Comparison
Schemes . not(A=v ): (NOT bitmap-A-v) AND ExistenceBitmap
Bitmap Indices
• Should keep bitmaps for all values, even null value
Module Summary
◦ To correctly handle SQL null semantics for NOT(A=v ):
. intersect above result with (NOT bitmap-A-Null)
Module 44
Partha Pratim • Bitmaps are packed into words; a single word and (a basic CPU instruction) computes
Das
and of 32 or 64 bits at once
Objectives &
Outline ◦ For example, 1-million-bit maps can be and-ed with just 31,250 instruction
Static Hashing • Counting number of 1s can be done fast by a trick:
Hash Function
Example ◦ Use each byte to index into a precomputed array of 256 elements each storing the
Bucket Overflow
Dynamic Hashing
count of 1s in the binary representation
Example . Can use pairs of bytes to speed up further at a higher memory cost
Comparison
Schemes ◦ Add up the retrieved counts
Bitmap Indices
• Bitmaps can be used instead of Tuple-ID lists at leaf levels of B+ -trees, for values that
Module Summary
have a large number of matching records
◦ Worthwhile if > 1/64 of the records have that value, assuming a tuple-id is 64 bits
◦ Above technique merges benefits of bitmap and B+ -tree indices
Module 44
Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Static Hashing
Hash Function
distinct values
Example
Bucket Overflow
Dynamic Hashing
Example
Comparison
Schemes
Bitmap Indices
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 44.39
Module 45
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Index Definition
in SQL Module 45: Indexing and Hashing/5: Index Design
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules Partha Pratim Das
Rule 0
Rule 1
Rule 2 Department of Computer Science and Engineering
Rule 3 Indian Institute of Technology, Kharagpur
Rule 4
Rule 5
ppd@[Link]
Rule 6
Module Summary
Module 45
Partha Pratim • Explored various hashing schemes – Static and Dynamic Hashing
Das
• Compared Ordered Indexing and Hashing
Objectives &
Outline • Studied the use of Bitmap Indices for fast access of columns with limited number of
Index Definition
in SQL
distinct values
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Das
Objectives &
Outline
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Index Definition in SQL
Rule 6
Module Summary
Module 45
• Create an index
Partha Pratim
Das
create index <index-name> on <relation-name> (<attribute-list>)
For example: create index b-index on branch (branch name)
Objectives &
Outline
• Use create unique index to indirectly specify and enforce the condition that the search
Index Definition
in SQL key is a candidate key
Multiple-Key Access
Privileges
◦ Not really required if SQL unique integrity constraint is supported – it is preferred
Guidelines for
Indexing
• To drop an index
Ground Rules drop index <index-name>
Rule 0
Rule 1 • Most database systems allow specification of type of index, and clustering
Rule 2
Rule 3 ◦ You can also create an index for a cluster
Rule 4
Rule 5
◦ You can create a composite index on multiple columns up to a maximum of 32
Rule 6 columns
Module Summary
. A composite index key cannot exceed roughly one-half (minus some overhead)
of the available space in the data block
Module 45 • Create an index for a single column, to speed up queries that test that column:
Partha Pratim ◦ CREATE INDEX emp ename ON emp tab(ename);
Das
• Specify several storage settings explicitly for the index:
Objectives &
Outline ◦ CREATE INDEX emp ename ON emp tab(ename)
Index Definition
TABLESPACE users // Allocation of space in the Database to contain schema objects
in SQL STORAGE ( // Specify how Database should store a database object
Multiple-Key Access
INITIAL 20K // Specify the size of the 1st extent of the object
Privileges
NEXT 20K // Specify in bytes the size of the 2nd extent to be allocated to the object
Guidelines for
Indexing PCTINCREASE 75) // Specify the percent by which later extents grow over
Ground Rules PCTFREE 0 // 0% of each data block in this table’s data segment be free for updates
Rule 0 COMPUTE STATISTICS;
Rule 1
Rule 2
◦ Create index on two columns, to speed up queries that test either the first column or both columns:
Rule 3 . CREATE INDEX emp ename ON emp tab(ename, empno) COMPUTE STATISTICS;
Rule 4
Rule 5
◦ If a query is going to sort on the function UPPER(ENAME), an index on the ENAME column itself
Rule 6 would not speed up this operation, and it might be slow to call the function for each result row
Module Summary . A function-based index precomputes the result of the function for each column value, speeding
up queries that use the function for searching or sorting:
CREATE INDEX emp upper ename ON emp tab(UPPER(ename)) COMPUTE STATISTICS;
Source: Selecting an Index Strategy
Database Management Systems Partha Pratim Das 45.7
Index in SQL: Bitmap PPD
Module 45
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6 • SELECT * FROM Student WHERE Gender = ‘F’ AND Semester =4;
Module Summary ◦ AND 0 1 1 1 with 0 0 0 1 to get the result
Module 45
Guidelines for
• Possible strategies for processing query using indices on single attributes:
Indexing
Ground Rules
◦ Use index on dept name to find instructors with department name Finance; test
Rule 0 salary = 80000
Rule 1
Rule 2 ◦ Use index on salary to find instructors with a salary of 80000; test dept name =
Rule 3
Rule 4
“Finance”
Rule 5 ◦ Use dept name index to find pointers to all records pertaining to the “Finance”
Rule 6
department. Similarly use index on salary. Take intersection of both sets of pointers
Module Summary
obtained
Module 45
Partha Pratim • Composite Search Keys are search keys containing more than one attribute
Das
◦ For example, (dept name, salary )
Objectives &
Outline • Lexicographic ordering: (a1 , a2 ) < (b1 , b2 ) if either
Index Definition
in SQL ◦ a1 < b1 , or
Multiple-Key Access
Privileges
◦ a1 = b1 and a2 < b2
Guidelines for • Hence, the order is important
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Suppose we have an index on combined search-key:
Das (dept name, salary )
Objectives &
Outline • With the where clause
Index Definition
in SQL
where dept name = “Finance” and salary = 80000
Multiple-Key Access the index on (dept name, salary ) can be used to fetch only records that satisfy both
Privileges
conditions.
Guidelines for
Indexing ◦ Using separate indices in less efficient - we may fetch many records (or pointers)
Ground Rules
Rule 0 that satisfy only one of the conditions
Rule 1
Rule 2 ◦ Can also efficiently handle
Rule 3
Rule 4
where dept name = “Finance” and salary < 80000
Rule 5 ◦ But cannot efficiently handle
Rule 6
where dept name < “Finance” and balance = 80000
Module Summary
. May fetch many records that satisfy the first but not the second condition
Module 45
Partha Pratim • When using indexes in an application, you might need to request that the DBA grant
Das
privileges or make changes to initialization parameters
Objectives &
Outline • To create a new index
Index Definition
in SQL
◦ You must own, or have the INDEX object privilege for the corresponding table
Multiple-Key Access ◦ The schema that contains the index must also have a quota for the tablespace
Privileges
intended to contain the index, or the UNLIMITED TABLESPACE system privilege
Guidelines for
Indexing ◦ To create an index in another user’s schema, you must have the CREATE ANY
Ground Rules
Rule 0
INDEX system privilege
Rule 1
Rule 2 • Function-based indexes also require the QUERY REWRITE privilege, and that the
Rule 3
Rule 4
QUERY REWRITE ENABLED initialization parameter to be set to TRUE
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim
Das
Objectives &
Outline
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Guidelines for Indexing
Rule 6
Module Summary
Module 45
• In Modules 16 to 20 (Week 4), we have studied various issues for a proper design of a
Partha Pratim
Das
relational database system. This focused on:
Objectives &
◦ Normalization of Tables leading to
Outline
. Reduction of Redundancy to minimize possibilities of Anomaly
Index Definition
in SQL . Easier adherence to constraints (various dependencies)
Multiple-Key Access
Privileges
. Efficiency of access and update – a better normalized design often gives better
Guidelines for
performance
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
• The performance of a database system, however, is also significantly impacted by the
Partha Pratim
Das
way the data is physically organized and managed. These are done through:
Objectives &
◦ Indexing and Hashing
Outline
• While normalization and design are startup time activities that are usually performed
Index Definition
in SQL once at the beginning (and rarely changed later), the performance behavior continues
Multiple-Key Access
Privileges
to evolve as the database is used over time. Hence we need to continually:
Guidelines for ◦ Collect Statistics about data (of various tables) to learn of the patterns, and
Indexing
Ground Rules ◦ Adjust the Indexes on the tables to optimize performance
Rule 0
Rule 1 • There is no sound theory that determines optimal performance. Rather, we take a quick
Rule 2
Rule 3
look into a few common guidelines that can help you keep your database agile in its
Rule 4 behavior
Rule 5
Rule 6
Module Summary
Module 45
Partha Pratim • Some guidelines - heuristic and common sense, but time-tested - are summarized here
Das
as a set of Ground Rules for Indexing
Objectives &
Outline ◦ Rule 0: Indexes lead to Access – Update Tradeoff
Index Definition ◦ Rule 1: Index the Correct Tables
in SQL
Multiple-Key Access
◦ Rule 2: Index the Correct Columns
Privileges
◦ Rule 3: Limit the Number of Indexes for Each Table
Guidelines for
Indexing ◦ Rule 4: Choose the Order of Columns in Composite Indexes
Ground Rules
◦ Rule 5: Gather Statistics to Make Index Usage More Accurate
Rule 0
Rule 1 ◦ Rule 6: Drop Indexes That Are No Longer Required
Rule 2
Rule 3 • These rules are explained in the following slides
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Guidelines for . Having unnecessary indexes can cause significant degradation of performance of
Indexing
Ground Rules
various operations
Rule 0 . Index files may also occupy significant space on your disk and / or
Rule 1
Rule 2 . Cause slow behavior due to memory limitations during index computations
Rule 3
Rule 4
◦ Use informed judgment to index!
Rule 5
Rule 6
Module Summary
Module 45
Guidelines for
− The faster the table scan, the lower the percentage
Indexing − More clustered the row data, the higher the percentage
Ground Rules
Rule 0 • Index columns used for joins to improve performance on joins of multiple tables
Rule 1
Rule 2
• Primary and unique keys automatically have indexes, but you might want to create an
Rule 3
Rule 4 index on a foreign key
Rule 5
Rule 6 • Small tables do not require indexes
Module Summary
◦ If a query is taking too long, then the table might have grown from small to large
Module Summary
. There are many nulls in the column and you do not search on the non-null values
. LONG and LONG RAW columns cannot be indexed
◦ The size of a single index entry cannot exceed roughly one-half (minus some
overhead) of the available space in the data block
Database Management Systems Partha Pratim Das 45.19
Guidelines for Indexing: Rule 3 PPD
Module 45
Partha Pratim • Rule 3: Limit the Number of Indexes for Each Table
Das
◦ The more indexes, the more overhead is incurred as the table is altered
Objectives &
Outline . When rows are inserted or deleted, all indexes on the table must be updated
Index Definition
in SQL
. When a column is updated, all indexes on the column must be updated
Multiple-Key Access ◦ You must weigh the performance benefit of indexes for queries against the
Privileges
Guidelines for
performance overhead of updates
Indexing
Ground Rules
. If a table is primarily read-only, you might use more indexes; but, if a table is
Rule 0 heavily updated, you might use fewer indexes
Rule 1
Rule 2
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module Summary
• Composite indexes speed up queries that use the leading portion of the index:
◦ So queries with WHERE clauses using only PART NO column also runs faster
◦ With only 5 distinct values, a separate index on VENDOR ID does not help
Database Management Systems Partha Pratim Das 45.21
Guidelines for Indexing: Rule 5 PPD
Module 45
Partha Pratim • Rule 5: Gather Statistics to Make Index Usage More Accurate
Das
◦ The database can use indexes more effectively when it has statistical information
Objectives &
Outline about the tables involved in the queries
Index Definition
in SQL
. Gather statistics when the indexes are created by including the keywords
Multiple-Key Access COMPUTE STATISTICS in the CREATE INDEX statement
Privileges
. As data is updated and the distribution of values changes, periodically refresh
Guidelines for
Indexing the statistics by calling procedures like (in Oracle):
Ground Rules
Rule 0 − DBMS [Link] TABLE STATISTICS and
Rule 1
Rule 2
− DBMS [Link] SCHEMA STATISTICS
Rule 3
Rule 4
Rule 5
Rule 6
Module Summary
Module 45
Module Summary
◦ If you drop a table, then all associated indexes are dropped
◦ To drop an index, the index must be contained in your schema or you must have the
DROP ANY INDEX system privilege
Database Management Systems Partha Pratim Das 45.23
Module Summary
Module 45
Index Definition
in SQL
Multiple-Key Access
Privileges
Guidelines for
Indexing
Ground Rules
Rule 0
Rule 1
Rule 2
Rule 3
Slides used in this presentation are borrowed from [Link] with kind
Rule 4
Rule 5 permission of the authors.
Rule 6
Edited and new slides are marked with “PPD”.
Module Summary
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 46: Transactions/1
Transaction
Concept
ACID
Transaction
States
State Transition
Partha Pratim Das
Diagram
Concurrent
Executions
Department of Computer Science and Engineering
Schedules
Indian Institute of Technology, Kharagpur
Example
ppd@[Link]
Module Summary
Module 46
Objectives &
• Recap of Balanced BST for optimal in-memory search data structures
Outline
Transaction
• Issues of external search data structures for persistent data
Concept
ACID
• Explored 2-3-4 Tree as a precursor to B/B+-Tree
Transaction
States
• Understood the B+ Tree and B Tree for Index files and data files
State Transition
Diagram
• Explored Static and Dynamic Hashing
Concurrent
Executions
• Compared Ordered Indexing and Hashing
Schedules
Example
• Studied the use of Bitmap Indices
Module Summary • Learnt to create indexes in SQL
• Learnt a set of Ground Rules for Indexing
Module 46
Partha Pratim • To understand the concept of transaction – ‘doing a task in a database’ and its state
Das
• To explore issues in concurrent execution of transactions
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
Objectives &
• Concurrent Executions
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example Transaction Concept
Module Summary
Module 46
Partha Pratim • A transaction is a unit of program execution that accesses and, possibly updates,
Das
various data items
• For example, transaction to transfer $50 from account A to account B:
Week Recap
Objectives &
Outline 1. read(A)
Transaction
Concept
2. A := A − 50
ACID 3. write(A)
Transaction
States
4. read(B)
State Transition 5. B := B + 50
Diagram
Concurrent
6. write(B)
Executions
Schedules
• Two main issues to deal with:
Example
◦ Failures of various kinds, such as hardware failures and system crashes
Module Summary
◦ Concurrent execution of multiple transactions
Module 46
• Atomicity Requirement
Partha Pratim
Das ◦ If the transaction fails after step 3 and Transaction to transfer $50 from
Week Recap
before step 6, money will be “lost” account A to account B:
Objectives & leading to an inconsistent database
Outline
state 1. read(A)
Transaction
Concept . Failure could be due to software or 2. A := A − 50
ACID
hardware 3. write(A)
Transaction
States
◦ The system should ensure that updates 4. read(B)
State Transition
Diagram
of a partially executed transaction are 5. B := B + 50
Concurrent
Executions not reflected in the database 6. write(B)
Schedules
Example
Module Summary
Transaction
− sum of balances of all accounts, minus sum of loan 1. read(A)
States
State Transition
amounts must equal value of cash-in-hand 2. A := A – 50
Diagram
◦ A transaction, when starting to execute, must see a consistent 3. write(A)
Concurrent
Executions database 4. read(B)
Schedules
Example
◦ During transaction execution the database may be temporarily 5. B := B + 50
Module Summary inconsistent 6. write(B)
◦ When the transaction completes successfully the database
must be consistent
. Erroneous transaction logic can lead to inconsistency
Database Management Systems Partha Pratim Das 46.8
Required Properties of a Transaction: ACID: Isolation
Module 46
• Isolation Requirement
Partha Pratim
Das ◦ If between steps 3 and 6 (of the fund transfer transaction), another transaction T2
Week Recap is allowed to access the partially updated database, it will see an inconsistent
Objectives & database (the sum A + B will be less than it should be)
Outline
Transaction
T1 T2
Concept
ACID
1. read(A)
Transaction
2. A := A − 50
States
State Transition
3. write(A)
Diagram
read(A), read(B), print(A + B)
Concurrent
Executions 4. read(B)
Schedules 5. B := B + 50
Example
Module Summary
6. write(B)
◦ Isolation can be ensured trivially by running transactions serially
. That is, one after the other
◦ However, executing multiple transactions concurrently has significant benefits
Database Management Systems Partha Pratim Das 46.9
Required Properties of a Transaction: ACID: Durability
Partha Pratim
Das
• Durability Requirement count A to account B:
Week Recap
◦ Once the user has been notified that
the transaction has completed (that 1. read(A)
Objectives &
Outline
is, the transfer of the $50 has taken 2. A := A – 50
Transaction
place), the updates to the database 3. write(A)
Concept
ACID
by the transaction must persist even if 4. read(B)
Transaction
there are software or hardware failures 5. B := B + 50
States
State Transition 6. write(B)
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46 A transaction is a unit of program execution that accesses and possibly updates various data items:
Partha Pratim • Atomicity: Atomicity guarantees that each transaction is treated as a single unit, which either succeeds
Das completely, or fails completely
Week Recap ◦ If any of the statements constituting a transaction fails to complete, the entire transaction fails and
Objectives & the database is left unchanged
Outline ◦ Atomicity must be guaranteed in every situation, including power failures, errors and crashes
Transaction
Concept • Consistency: Consistency ensures that a transaction can only bring the database from one valid state to
ACID another, maintaining database invariants
Transaction
States
◦ Any data written to the database must be valid according to all defined rules, including constraints,
State Transition cascades, triggers, and any combination thereof
Diagram
Concurrent
• Isolation: Transactions are often executed concurrently (multiple transactions reading and writing to a
Executions table at the same time)
Schedules
Example ◦ Isolation ensures that concurrent execution of transactions leaves the database in the same state
Module Summary that would have been obtained if the transactions were executed sequentially
• Durability: Durability guarantees that once a transaction has been committed, it will remain committed
even in the case of a system failure (like power outage or crash)
◦ This usually means that completed transactions (or their effects) are recorded in non-volatile memory
Database Management Systems Partha Pratim Das 46.11
ACID Properties: Quick Reckoner PPD
Module 46
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example Transaction States
Module Summary
Module 46 • Every transaction can be in one of the following states (like Process States in OS)
Partha Pratim ◦ Active
Das
. The initial state; the transaction stays in this state while it is executing
Week Recap
◦ Partially committed
Objectives &
Outline . After the final statement has been executed
Transaction
Concept ◦ Failed
ACID
. After the discovery that normal execution can no longer proceed
Transaction
States ◦ Aborted
State Transition
Diagram . After the transaction has been rolled back and the database restored to its state
Concurrent
Executions
prior to the start of the transaction. Two options after it has been aborted:
Schedules − Restart the transaction: Can be done only if no internal logical error
Example − Kill the transaction
Module Summary
◦ Committed
. After successful completion
◦ Terminated
. After it has been committed or aborted
Database Management Systems
(killed)
Partha Pratim Das 46.14
Transitions for Transaction States PPD
Module 46
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example Concurrent Executions
Module Summary
Module 46
Partha Pratim • Multiple transactions are allowed to run concurrently in the system. Advantages are:
Das
◦ Increased processor and disk utilization, leading to better transaction throughput
Week Recap
Objectives &
. For example, one transaction can be using the CPU while another is reading
Outline from or writing to the disk
Transaction
Concept ◦ Reduced average response time for transactions: short transactions need not
ACID
wait behind long ones
Transaction
States • Concurrency Control Schemes: Mechanisms to achieve isolation
State Transition
Diagram
◦ To control the interaction among the concurrent transactions in order to prevent
Concurrent
Executions them from destroying the consistency of the database
Schedules
Example
Module Summary
Module 46
Partha Pratim • Schedule: A sequence of instructions that specify the chronological order in which
Das
instructions of concurrent transactions are executed
Week Recap
◦ A schedule for a set of transactions must consist of all instructions of those
Objectives &
Outline transactions
Transaction ◦ Must preserve the order in which the instructions appear in each individual
Concept
ACID transaction
Transaction
States
• A transaction that successfully completes its execution will have a commit instructions
State Transition
Diagram
as the last statement
Concurrent ◦ By default transaction assumed to execute commit instruction as its last step
Executions
Schedules • A transaction that fails to successfully complete its execution will have an abort
Example
Module Summary
instruction as the last statement
Module 46
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B
Partha Pratim
Das • An example of a serial schedule in which T1 is followed by T2 :
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
• A serial schedule in which T2 is followed by T1 :
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example Values of A & B are different from
Module Summary Schedule 1 – yet consistent
Module 46
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
Partha Pratim
Das
serial schedule, but it is equivalent to Schedule 1
Schedule 3 Schedule 1
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
• The following concurrent schedule does not preserve the sum of ”A + B”
Partha Pratim
Das
Week Recap
Objectives &
Outline
Transaction
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules
Example
Module Summary
Module 46
Partha Pratim • A task in a database is done as a transaction that passes through several states
Das
• Transactions are executed in concurrent fashion for better throughput
Week Recap
Objectives &
• Concurrent execution of transactions raise serializability issues that need to be addressed
Outline
Transaction
• All schedules may not satisfy ACID properties
Concept
ACID
Transaction
States
State Transition
Diagram
Concurrent
Executions
Schedules Slides used in this presentation are borrowed from [Link] with kind
Example
permission of the authors.
Module Summary
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Serializability
Conflicting
Module 47: Transactions/2: Serializability
Instructions
Conflict
Serializability
Examples
Precedence Graph Partha Pratim Das
Tests
Module Summary
Department of Computer Science and Engineering
Indian Institute of Technology, Kharagpur
ppd@[Link]
Module 47
Partha Pratim • A task in a database is done as a transaction that passes through several states
Das
• Transactions are executed in concurrent fashion for better throughput
Objectives &
Outline • Concurrent execution of transactions raise serializability issues that need to be addressed
Serializability
Conflicting • All schedules may not satisfy ACID properties
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47
Partha Pratim • To understand the issues that arise when two or more transactions work concurrently
Das
• To introduce the notions of Serializability that ensure schedules for transactions that
Objectives &
Outline may run in concurrent fashion but still guarantee and serial behavior
Serializability
Conflicting
• To analyze the conditions, called conflicts, that need to be honored to attain
Instructions
Serializable schedules
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47
Partha Pratim
Das
Objectives &
Outline
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Serializability
Module 47
Conflict
a) Conflict Serializability
Serializability
Examples
b) View Serializability
Precedence Graph
Tests
Module Summary
Module 47
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
Partha Pratim
Das
serial schedule, but it is equivalent to Schedule 1
Schedule 3 Schedule 1
Objectives &
Outline
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47
• The following concurrent schedule does not preserve the sum of ”A + B”
Partha Pratim
Das
Objectives &
Outline
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47
Partha Pratim • We ignore operations other than read and write instructions
Das
◦ Other operations happen in memory (are temporary in nature) and (mostly) do not
Objectives &
Outline affect the state of the database
Serializability ◦ This is a simplifying assumption for analysis
Conflicting
Instructions • We assume that transactions may perform arbitrary computations on data in local
Conflict
Serializability
buffers in between reads and writes
Examples
Precedence Graph
• Our simplified schedules consist of only read and write instructions
Tests
Module Summary
Module 47
Partha Pratim • Let li and lj be two Instructions from transactions Ti and Tj respectively
Das
• Instructions li and lj conflict if and only if there exists some item Q accessed by both li
Objectives &
Outline and lj , and at least one of these instructions write to Q
Serializability a) li = read(Q), lj = read(Q). li and lj don’t conflict
Conflicting
Instructions b) li = read(Q), lj = write(Q). They conflict
Conflict
Serializability
c) li = write(Q), lj = read(Q). They conflict
Examples d) li = write(Q), lj = write(Q). They conflict
Precedence Graph
Tests • Intuitively, a conflict between li and lj forces a (logical) temporal order between them
Module Summary
◦ If li and lj are consecutive in a schedule and they do not conflict, their results would
remain the same even if they had been interchanged in the schedule
Module 47
Partha Pratim
Das
Objectives &
Outline
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Conflict Serializability
Module 47
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Module 47 • Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1 ,
Partha Pratim by a series of swaps of non-conflicting instructions:
Das
Module Summary
Module 47
Objectives &
Outline
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
• We are unable to swap instructions in the above schedule to obtain either the serial
Precedence Graph schedule < T3 , T4 >, or the serial schedule < T4 , T3 >
Tests
Module Summary
Module 47
Partha Pratim
Das
Consider two transactions:
Objectives &
Outline Transaction 1 Transaction 2
Serializability UPDATE accounts UPDATE accounts
Conflicting
Instructions SET balance = balance - 100 SET balance = balance * 1.005
Conflict WHERE acct id = 31414
Serializability
Examples
Precedence Graph
Tests
Schedule S
• In terms of read / write we can write these as:
Module Summary
Transaction 1: r1 (A), w1 (A) // A is the balance for acct id = 31414
Transaction 2: r2 (A), w2 (A), r2 (B), w2 (B) // B is balance of other accounts
• Consider schedule S:
◦ Schedule S : r1 (A), r2 (A), w1 (A), w2 (A), r2 (B), w2 (B)
◦ Suppose: A starts with $200, and account B starts with $100
• Schedule S is very bad! (At least, it’s bad if you’re the bank!) We withdrew $100 from account A, but
somehow the database has recorded that our account now holds $201!
Database Management Systems Partha Pratim Das 47.15
Example: Bad Schedule (2)
Module 47
Partha Pratim
Das • Ideal schedule is serial:
Serial schedule 1:
Objectives &
Outline r1 (A), w1 (A), r2 (A), w2 (A), r2 (B), w2 (B)
Serializability
Serial schedule 2:
Conflicting r2 (A), w2 (A), r2 (B), w2 (B), r1 (A), w1 (A)
Instructions
Conflict
• We call a schedule serializable if it has the same ef-
Serializability fect as some serial schedule regardless of the specific
Examples information in the database.
Precedence Graph
Tests • As an example, consider Schedule T , which has
Module Summary
swapped the third and fourth operations from S:
◦ Schedule S : r1 (A), r2 (A), w1 (A), w2 (A), r2 (B), w2 (B)
◦ Schedule T : r1 (A), r2 (A), w2 (A), w1 (A), r2 (B), w2 (B)
Schedule T
• By first example, the outcome is the same as Serial schedule 1. But that’s just a peculiarity of the
data, as revealed by the second example, where the final value of A can’t be the consequence of either
of the possible serial schedules.
• So neither S nor T are serializable
Database Management Systems Partha Pratim Das 47.16
Example: Good Schedule PPD
Module 47
Module 47
• Are all serializable schedules conflict-serializable? No.
Partha Pratim
Das • Consider the following schedule for a set of three transactions.
Objectives & ◦ w1 (A), w2 (A), w2 (B), w1 (B), w3 (B)
Outline
Serializability
• We can perform no swaps to this:
Conflicting
Instructions ◦ The first two operations are both on A and at least one is a write;
Conflict ◦ The second and third operations are by the same transaction;
Serializability
Examples
◦ The third and fourth are both on B at least one is a write; and
Precedence Graph
Tests
◦ So are the fourth and fifth.
Module Summary
◦ So this schedule is not conflict-equivalent to anything – and certainly not any serial
schedules.
• However, since nobody ever reads the values written by the w1 (A), w2 (B), and w1 (B)
operations, the schedule has the same outcome as the serial schedule:
◦ w1 (A), w1 (B), w2 (A), w2 (B), w3 (B)
Source: Serializability
Module 47
Module 47
Partha Pratim
Das
• A schedule is conflict serializable if and only if its prece-
Objectives &
Outline dence graph is acyclic
Serializability
Conflicting
• Cycle-detection algorithms exist which take order n2
Instructions
time, where n is the number of vertices in the graph
Conflict
Serializability ◦ (Better algorithms take order n + e where e is the
Examples
Precedence Graph
number of edges)
Tests
• If precedence graph is acyclic, the serializability order can
Module Summary
be obtained by a topological sorting of the graph
◦ That is, a linear order consistent with the partial order
of the graph.
◦ For example, a serializability order for the schedule
(a) would be one of either (b) or (c)
Module 47
Partha Pratim
• Build a directed graph, with a vertex for each transaction.
Das
• Go through each operation of the schedule.
Objectives &
Outline ◦ If the operation is of the form wi (X ), find each subsequent operation in the
Serializability schedule also operating on the same data element X by a different transaction: that
Conflicting
Instructions is, anything of the form rj (X ) or wj (X ). For each such subsequent operation, add a
Conflict directed edge in the graph from Ti to Tj .
Serializability
Examples ◦ If the operation is of the form ri (X ), find each subsequent write to the same data
Precedence Graph
Tests
element X by a different transaction: that is, anything of the form wj (X ). For each
Module Summary
such subsequent write, add a directed edge in the graph from Ti to Tj .
• The schedule is conflict-serializable if and only if the resulting directed graph is acyclic.
• Moreover, we can perform a topological sort on the graph to discover the serial
schedule to which the schedule is conflict-equivalent.
Module 47
• Consider the following schedule:
Partha Pratim
Das
◦ w1 (A), r2 (A), w1 (B), w3 (C ), r2 (C ), r4 (B), w2 (D), w4 (E ), r5 (D), w5 (E )
• We start with an empty graph with five vertices labeled T1 , T2 , T3 , T4 , T5 .
Objectives &
Outline
Serializability
• We go through each operation in the schedule:
Conflicting
w1 (A): A is subsequently read by T2 , so add edge T1 → T2
Instructions
r2 (A): no subsequent writes to A, so no new edges
Conflict w1 (B): B is subsequently read by T4 , so add edge T1 → T4
Serializability
Examples
w3 (C ): C is subsequently read by T2 , so add edge T3 → T2
Precedence Graph r2 (C ): no subsequent writes to C , so no new edges
Tests r4 (B): no subsequent writes to B, so no new edges
Module Summary w2 (D): C is subsequently read by T2 , so add edge T3 → T2
w4 (E ): E is subsequently written by T5 , so add edge T4 → T5
r5 (D): no subsequent writes to D, so no new edges
w5 (E ): no subsequent operations on E , so no new edges
• We end up with precedence graph
• This graph has no cycles, so the original schedule must be serializable. Moreover, since one way to
topologically sort the graph is T3 − T1 − T4 − T2 − T5 , one serial schedule that is conflict-equivalent is
◦ w3 (C ), w1 (A), w1 (B), r4 (B), w4 (E ), r2 (A), r2 (C ), w2 (D), r5 (D), w5 (E )
Database Management Systems Partha Pratim Das 47.22
Module Summary
Module 47
Partha Pratim • Understood the issues that arise when two or more transactions work concurrently
Das
• Learnt the forms of serializability in terms of conflict and view serializability
Objectives &
Outline • Acyclic precedence graph can ensure conflict serializability
Serializability
Conflicting
Instructions
Conflict
Serializability
Examples
Precedence Graph
Tests
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Recovery
Example
Module 48: Transactions/3: Recoverability
Transactions in
SQL
TCL
COMMIT
ROLLBACK Partha Pratim Das
SAVEPOINT
SET
TRANSACTION Department of Computer Science and Engineering
View Indian Institute of Technology, Kharagpur
Serializability
Test ppd@[Link]
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim • Understood the issues that arise when two or more transactions work concurrently
Das
• Learnt the forms of serializability in terms of conflict and view serializability
Objectives &
Outline • Acyclic precedence graph can ensure conflict serializability
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim • What happens if system fails while a transaction is in execution? Can a consistent state
Das
be reached for the database? Recoverability attempts to answer issues in state and
Objectives &
Outline
transaction recovery in the face of system failures
Recovery • Conflict serializability is a crisp concept for concurrent execution that guarantees ACID
Example
properties and has a simple detection algorithm. Yet only few schedules are Conflict
Transactions in
SQL serializable in practice. There is a need to explore – View Serializability – a weaker
TCL
COMMIT
system for better concurrency
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Recovery
Example
Complex Notions
of Serializability
Module Summary
Module 48
Complex Notions
◦ Leads to inconsistent state
of Serializability ◦ Need to rollback update of A
Module Summary
• This is known as Recovery
Database Management Systems Partha Pratim Das 48.6
Recoverable Schedules
Module 48
Partha Pratim • If a transaction Tj reads a data item previously written by a transaction Ti , then the
Das
commit operation of Ti must appear before the commit operation of Tj .
Objectives &
Outline • The following schedule is not recoverable if T9 commits immediately after the read(A)
Recovery operation
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
• If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent
of Serializability
database state. Hence, database must ensure that schedules are recoverable
Module Summary
Module 48
Partha Pratim • Cascading rollback: A single transaction failure leads to a series of transaction
Das
rollbacks. Consider the following schedule where none of the transactions has yet
Objectives &
Outline
committed (so the schedule is recoverable)
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
• If T10 fails, T11 and T12 must also be rolled back
Module Summary • Can lead to the undoing of a significant amount of work
Database Management Systems Partha Pratim Das 48.8
Cascadeless Schedules
Module 48
Partha Pratim • Cascadeless schedules: For each pair of transactions Ti and Tj such that Tj reads a
Das
data item previously written by Ti , the commit operation of Ti appears before the read
Objectives &
Outline
operation of Tj
Recovery • Every cascadeless schedule is also recoverable
Example
Transactions in
• It is desirable to restrict the schedules to those that are cascadeless
SQL
TCL • Example of a schedule that is NOT cascadeless
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Rollback is possible only till the end (commit) of T2. So the computation of A (4000) and
write in T1 is lost.
Database Management Systems Partha Pratim Das 48.10
Example: Recoverable Schedule with Cascading Rollback PPD
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Rollback is possible as T2 has not committed yet. But T2 also need to be rolled back for
rolling back T1.
Database Management Systems Partha Pratim Das 48.11
Example: Recoverable Schedule without Cascading Rollback PPD
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Module Summary
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Transaction Definition in SQL
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim • Data manipulation language must include a construct for specifying the set of actions
Das
that comprise a transaction
Objectives &
Outline ◦ In SQL, a transaction begins implicitly
Recovery ◦ A transaction in SQL ends by:
Example
Transactions in
. Commit work
SQL
TCL
− Commits the current transaction and begins a new one
COMMIT
. Rollback work
ROLLBACK
SAVEPOINT − Causes current transaction to abort
SET
TRANSACTION
◦ In almost all database systems, by default, every SQL statement also commits
View
Serializability implicitly if it executes successfully
Test
Example
. Implicit commit can be turned off by a database directive
Complex Notions
of Serializability
− For example in JDBC, [Link](false);
Module Summary
Module 48
• The following commands are used to control transactions
Partha Pratim
Das ◦ COMMIT
Objectives & . To save the changes
Outline
◦ ROLLBACK
Recovery
Example . To roll back the changes
Transactions in
SQL ◦ SAVEPOINT
TCL
COMMIT
. Creates points within the groups of transactions in which to ROLLBACK
ROLLBACK
SAVEPOINT
◦ SET TRANSACTION
SET
TRANSACTION . Places a name on a transaction
View
Serializability
• Transactional control commands are only used with the DML Commands such as
Test
Example
◦ INSERT, UPDATE and DELETE only
Complex Notions
◦ They cannot be used while creating tables or dropping them because these
of Serializability
operations are automatically committed in the database
Module Summary Source: SQL - Transactions
Module 48 • COMMIT is the transactional command used to save changes invoked by a transaction to the
Partha Pratim database
Das
• COMMIT saves all the transactions to the database since the last COMMIT or ROLLBACK
Objectives &
Outline
command
Recovery • The syntax for the COMMIT command is as follows:
Example
◦ SQL> DELETE FROM Customers WHERE AGE = 25;
Transactions in
SQL ◦ SQL> COMMIT;
TCL
COMMIT
ROLLBACK
SQL> SELECT * FROM Customers; SQL> SELECT * FROM Customers;
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48 • The ROLLBACK is the command used to undo transactions that have not already been saved
Partha Pratim to the database
Das
• This can only be used to undo transactions since the last COMMIT or ROLLBACK command
Objectives &
Outline was issued
Recovery • The syntax for a ROLLBACK command is as follows:
Example
Transactions in
◦ SQL> DELETE FROM Customers WHERE AGE = 25;
SQL ◦ SQL> ROLLBACK;
TCL
COMMIT
ROLLBACK SQL> SELECT * FROM Customers; SQL> SELECT * FROM Customers;
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48 Example:
Partha Pratim
• A SAVEPOINT is a point in a transaction when • SQL> SAVEPOINT SP1;
Das you can roll the transaction back to a certain point
◦ Savepoint created.
without rolling back the entire transaction
Objectives & • SQL> DELETE FROM Customers WHERE ID=1;
Outline • The syntax for a SAVEPOINT command is:
◦ 1 row deleted.
Recovery ◦ SAVEPOINT SAVEPOINT NAME;
Example • SQL> SAVEPOINT SP2;
• This command serves only in the creation of a
Transactions in
SQL SAVEPOINT among all the transactional state- ◦ Savepoint created.
TCL ments. • SQL> DELETE FROM Customers WHERE ID=2;
COMMIT
ROLLBACK • The ROLLBACK command is used to undo a ◦ 1 row deleted.
SAVEPOINT
SET
group of transactions • SQL> SAVEPOINT SP3;
TRANSACTION • The syntax for rolling back to a SAVEPOINT is: ◦ Savepoint created.
View
Serializability
◦ ROLLBACK TO SAVEPOINT NAME; • SQL> DELETE FROM Customers WHERE ID=3;
Test
◦ 1 row deleted.
Example
Complex Notions
of Serializability
Source: SQL - Transactions
Module Summary
Transactions in
SQL SQL> SELECT * FROM Customers SQL> SELECT * FROM Customers;
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Source: SQL - Transactions
Module 48 • The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have
Partha Pratim created
Das
• The syntax for a RELEASE SAVEPOINT command is as follows
Objectives &
Outline ◦ RELEASE SAVEPOINT SAVEPOINT NAME;
Recovery
Example • Once a SAVEPOINT has been released, you can no longer use the ROLLBACK
Transactions in command to undo transactions performed since the last SAVEPOINT
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test Source: SQL - Transactions
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim • The SET TRANSACTION command can be used to initiate a database transaction
Das
• This command is used to specify characteristics for the transaction that follows
Objectives &
Outline ◦ For example, you can specify a transaction to be read only or read write
Recovery
Example
• The syntax for a SET TRANSACTION command is as follows:
Transactions in
SQL
◦ SET TRANSACTION [ READ WRITE | READ ONLY ];
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test Source: SQL - Transactions
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
View Serializability
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim • Let S and S 0 be two schedules with the same set of transactions. S and S 0 are view
Das
equivalent if the following three conditions are met, for each data item Q,
Objectives &
Outline ◦ Initial Read: If in schedule S, transaction Ti reads the initial value of Q, then in
Recovery schedule S 0 also transaction Ti must read the initial value of Q
Example
◦ Write-Read Pair: If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S 0 also transaction Ti
Transactions in
SQL
TCL
COMMIT
must read the value of Q that was produced by the same write(Q) operation of
ROLLBACK transaction Tj
SAVEPOINT
SET ◦ Final Write: The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S 0
TRANSACTION
View
Serializability
Test
• As can be seen, view equivalence is also based purely on reads and writes alone
Example
Complex Notions
of Serializability
Module Summary
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
• What serial schedule is above equivalent to?
TRANSACTION
View
◦ T27 − T28 − T29
Serializability ◦ The one read(Q) instruction reads the initial value of Q in both schedules and
Test
Example ◦ T29 performs the final write of Q in both schedules
Complex Notions
of Serializability
• T28 and T29 perform write(Q) operations called blind writes, without having
Module Summary performed a read(Q) operation
• Every view serializable schedule that is not conflict serializable has blind writes
Database Management Systems Partha Pratim Das 48.24
Test for View Serializability
Module 48
Partha Pratim • The precedence graph test for conflict serializability cannot be used directly to test for
Das
view serializability
Objectives &
Outline ◦ Extension to test for view serializability has cost exponential in the size of the
Recovery precedence graph
Example
Transactions in
• The problem of checking if a schedule is view serializable falls in the class of
SQL NP-complete problems
TCL
COMMIT ◦ Thus, existence of an efficient algorithm is extremely unlikely
ROLLBACK
SAVEPOINT • However, practical algorithms that just check some sufficient conditions for view
SET
TRANSACTION serializability can still be used
View
Serializability
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives &
Outline
• Solution:
Recovery ◦ With 3 transactions, total number of schedules possible = 3! = 6
Example
. < T1 T2 T3 >
Transactions in
SQL . < T1 T3 T2 >
TCL
COMMIT
. < T2 T3 T1 >
ROLLBACK
. < T2 T1 T3 >
SAVEPOINT
SET . < T3 T1 T2 >
TRANSACTION
View
. < T3 T2 T1 >
Serializability
Test
Example
Complex Notions
of Serializability Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Module Summary
Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives & • Solution:
Outline
Recovery
◦ Final update on data items:
Example
. A : − (No write on A)
Transactions in
SQL . B : T1 , T2 , T3 (All 3 transactions write B)
TCL . As the final update on B is made by T3 , (T1 , T2 ) → T3 . Now, Removing those
COMMIT
ROLLBACK schedules in which T3 is not executing at last:
SAVEPOINT
SET − < T1 T2 T3 >
TRANSACTION
View
− < T2 T1 T3 >
Serializability
Test
Example
Complex Notions
of Serializability Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Module Summary
Module 48
• Check whether the schedule is view serializable or not?
Partha Pratim
Das ◦ S : R2(B); R2(A); R1(A); R3(A); W 1(B); W 2(B); W 3(B);
Objectives & • Solution:
Outline
Recovery
◦ Initial Read + Which transaction updates after read?
Example
. A : T2 , T1 , T3 (initial read)
Transactions in
SQL . B : T2 (initial read); T1 (update after read)
TCL . The transaction T2 reads B initially which is updated by T1 . So T2 must
COMMIT
ROLLBACK execute before T1 . Hence, T2 → T1 . So only one schedule survives:
SAVEPOINT
SET
. < T2 T1 T3 >
TRANSACTION
View
◦ Write Read Sequence (WR)
Serializability
Test
. No need to check here
Example
◦ Hence, view equivalent serial schedule is:
Complex Notions
of Serializability . T2 → T1 → T3
Module Summary
Source: http: // www. edugrabs. com/ how- to- check- for- view- serializable- schedule/ (Accessed 12-Feb-18)
Database Management Systems Partha Pratim Das 48.28
View Serializability: Example 2 PPD
Module 48
Partha Pratim • Check whether S is Conflict serializable and / or view serializable or not?
Das
◦ S : R1(A); R2(A); R3(A); R4(A); W 1(B); W 2(B); W 3(B); W 4(B)
Objectives &
Outline • Solution is given in the next slide (hidden). First try to solve this and then check the
Recovery solution.
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability Source: Given in solution slides
Test
Example
Complex Notions
of Serializability
Module Summary
Module 48
Partha Pratim
Das
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Complex Notions of Serializability
Example
Complex Notions
of Serializability
Module Summary
Module 48
• The schedule below produces the same outcome as the serial schedule < T 1, T 5 >, yet
Partha Pratim
Das
is not conflict equivalent or view equivalent to it
Objectives &
Outline
Recovery
Example
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability
Test
Example
Complex Notions
of Serializability
• If we start with A = 1000 and B = 2000, the final result is 960 and 2040
Module Summary
• Determining such equivalence requires analysis of operations other than read and write
Database Management Systems Partha Pratim Das 48.31
Module Summary
Module 48
Partha Pratim • With proper planning, a database can be recovered back to a consistent state from
Das
inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Objectives &
Outline
cascadeless rollback
Recovery • View Serializability is a weaker serializability system for better concurrency. However,
Example
testing for view serializability is NP complete
Transactions in
SQL
TCL
COMMIT
ROLLBACK
SAVEPOINT
SET
TRANSACTION
View
Serializability Slides used in this presentation are borrowed from [Link] with kind
Test
Example permission of the authors.
Complex Notions Edited and new slides are marked with “PPD”.
of Serializability
Module Summary
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Concurrency
Control Module 49: Concurrency Control/1
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Partha Pratim Das
Protocol
Lock Conversions
Automatic
Department of Computer Science and Engineering
Acquisition of Locks Indian Institute of Technology, Kharagpur
Deadlocks
Starvation
ppd@[Link]
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.1
Module Recap PPD
Module 49
Partha Pratim • With proper planning, a database can be recovered back to a consistent state from
Das
inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Objectives &
Outline
cascadeless rollback
Concurrency • View Serializability is a weaker serializability system for better concurrency. However,
Control
Lock-Based
testing for view serializability is NP complete
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.2
Module Objectives PPD
Module 49
Partha Pratim • Concurrency Control through design of serializable schedule is difficult in general.
Das
Hence we take a look into locking mechanism and Lock-Based Protocols
Objectives &
Outline • We need to understand how locks may be implemented
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.3
Module Outline PPD
Module 49
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.4
Concurrency Control PPD
Module 49
Partha Pratim
Das
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Concurrency Control
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.5
Concurrency Control
Module 49
Partha Pratim • A database must provide a mechanism that will ensure that all possible schedules are
Das
both:
Objectives &
Outline ◦ Conflict serializable
Concurrency ◦ Recoverable and, preferably, Cascadeless
Control
Lock-Based
• A policy in which only one transaction can execute at a time generates serial schedules,
Protocols
Example
but provides a poor degree of concurrency
Lock-Based
Protocols • Concurrency-control schemes tradeoff between the amount of concurrency they allow
Two-Phase Locking
Protocol and the amount of overhead that they incur
Lock Conversions
Automatic • Testing a schedule for serializability after it has executed is a little too late!
Acquisition of Locks
Deadlocks ◦ Tests for serializability help us understand why a concurrency control protocol is
Starvation
Cascading
correct
More Protocols
• Goal: To develop concurrency control protocols that will assure serializability
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.6
Concurrency Control (2) PPD
Module 49
Partha Pratim • One way to ensure isolation is to require that data items be accessed in a mutually
Das
exclusive manner; that is, while one transaction is accessing a data item, no other
Objectives &
Outline
transaction can modify that data item
Concurrency ◦ Should a transaction hold a lock on the whole database
Control
Lock-Based
. Would lead to strictly serial schedules – very poor performance
Protocols
Example
• The most common method used to implement locking requirement is to allow a
Lock-Based
Protocols
transaction to access a data item only if it is currently holding a lock on that item
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.7
Lock-Based Protocols PPD
Module 49
Partha Pratim
Das
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Lock-Based Protocols
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.8
Lock-Based Protocols
Module 49
Module Summary
Database Management Systems Partha Pratim Das 49.9
Lock-Based Protocols (2): Lock Compatibility Matrix
Module 49 • Lock-Compatibility Matrix: A lock compatibility matrix is used which states whether
Partha Pratim a data item can be locked by two transactions at the same time
Das
• Full compatibility matrix
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks • Abbreviated compatibility matrix
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.10
Lock-Based Protocols (3)
Concurrency
◦ Any number of transactions can hold shared locks on an item
Control ◦ But if any transaction holds an exclusive lock on the item no other transaction may hold any lock
Lock-Based on the item
Protocols
Example • Waiting for a Lock
Lock-Based
Protocols ◦ If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks
Two-Phase Locking
Protocol
held by other transactions have been released
Lock Conversions
Automatic
• Holding a Lock
Acquisition of Locks
Deadlocks
◦ A transaction must hold a lock on a data item as long as it accesses that item
Starvation
Cascading
• Unlocking / Releasing a Lock
More Protocols
◦ Transaction Ti may unlock a data item that it had locked at some earlier point
Implementation
of Locking
◦ It is not necessarily desirable for a transaction to unlock a data item immediately after its final
Lock Table
access of that data item, since serializability may not be ensured
Module Summary
Database Management Systems Partha Pratim Das 49.11
Lock-Based Protocols: Example: Serial Schedule
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.12
Lock-Based Protocols: Example (2): Concurrent Schedule: Bad
Module 49
• If, however, these transactions are executed concur-
Partha Pratim
rently, then schedule 1 is possible
Das
• In this case, transaction T2 displays $250, which is
Objectives &
Outline
incorrect. The reason for this mistake is that
Concurrency ◦ the transaction T1 unlocked data item B too early,
Control
as a result of which T2 saw an inconsistent state
Lock-Based
Protocols
Example
• Suppose we delay unlocking till the end
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Schedule 1
Module Summary
Database Management Systems Partha Pratim Das 49.13
Lock-Based Protocols: Example (3): Concurrent Schedule: Good
Module 49
• Delaying unlocking till the end, T1 becomes T3 &
Partha Pratim
T2 becomes T4
Das
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation • Hence, sequence of reads and writes as in Schedule
Cascading
More Protocols
1 is no longer possible
Implementation • T4 will correctly display $300 Schedule 1
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.14
Lock-Based Protocols: Example (4): Concurrent Schedule:
Deadlock
Module 49
• Given, T3 and T4 , consider Schedule 2 (partial)
Partha Pratim
Das • Since T3 is holding an exclusive mode lock on B and T4 is
Objectives &
requesting a shared-mode lock on B, T4 is waiting for T3 to
Outline unlock B
Concurrency
Control • Similarly, since T4 is holding a shared-mode lock on A and
Lock-Based T3 is requesting an exclusive-mode lock on A, T3 is waiting
Protocols
Example
for T4 to unlock A
Lock-Based
Protocols
• Thus, we have arrived at a state where neither of these trans-
Two-Phase Locking
Protocol
actions can ever proceed with its normal execution
Lock Conversions
Automatic
• This situation is called deadlock
Acquisition of Locks
Deadlocks
• When deadlock occurs, the system must roll back one of the
Starvation two transactions.
Cascading
More Protocols • Once a transaction has been rolled back, the data items that
Implementation were locked by that transaction are unlocked.
of Locking
Lock Table • These data items are then available to the other transaction, Schedule 2
Module Summary which can continue with its execution.
Database Management Systems Partha Pratim Das 49.15
Lock-Based Protocols
Module 49
Partha Pratim • If we do not use locking, or if we unlock data items too soon after reading or writing them, we
Das
may get inconsistent states
Objectives &
Outline
• On the other hand, if we do not unlock a data item before requesting a lock on another data
Concurrency
item, deadlocks may occur
Control
• Deadlocks are a necessary evil associated with locking, if we want to avoid inconsistent states
Lock-Based
Protocols • Deadlocks are definitely preferable to inconsistent states, since they can be handled by rolling
Example
Lock-Based
back transactions, whereas inconsistent states may lead to real-world problems that cannot be
Protocols
Two-Phase Locking
handled by the database system
Protocol
Lock Conversions
• A locking protocol is a set of rules followed by all transactions while requesting and releasing
Automatic
Acquisition of Locks
locks
Deadlocks
Starvation
• Locking protocols restrict the set of possible schedules
Cascading
More Protocols
• The set of all such schedules is a proper subset of all possible serializable schedules
Implementation • We present locking protocols that allow only conflict-serializable schedules, and thereby ensure
of Locking
Lock Table
isolation
Module Summary
Database Management Systems Partha Pratim Das 49.16
Two-Phase Locking Protocol
Module 49
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.17
Two-Phase Locking Protocol (2)
Module 49
Partha Pratim • There can be conflict serializable schedules that cannot be obtained if two-phase
Das
locking is used
Objectives &
Outline • However, in the absence of extra information (that is, ordering of access to data),
Concurrency two-phase locking is needed for conflict serializability in the following sense:
Control
Lock-Based
◦ Given a transaction Ti that does not follow two-phase locking, we can find a
Protocols
Example
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
Lock-Based
Protocols
conflict serializable
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.18
Lock Conversions
Module 49
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.19
Automatic Acquisition of Locks: Read
Module 49
Partha Pratim • A transaction Ti issues the standard read/write instruction, without explicit locking calls
Das
• The operation read(D) is processed as:
Objectives &
Outline if Ti has a lock on D
Concurrency then
Control
Lock-Based
read(D)
Protocols else begin
Example
Lock-Based if necessary, wait until no other transaction has a lock-X on D
Protocols
Two-Phase Locking grant Ti a lock-S on D;
Protocol
Lock Conversions
read(D)
Automatic
Acquisition of Locks
end
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.20
Automatic Acquisition of Locks: Write
Module 49
Module Summary
Database Management Systems Partha Pratim Das 49.21
Deadlocks
Module 49
• Two-phase locking does not ensure freedom from
Partha Pratim
Das deadlocks
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
• Observe that transactions T3 and T4 are two phase,
More Protocols but, in deadlock
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.22
Starvation
Module 49
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.23
Cascading Rollback
Module 49
• The potential for deadlock exists in most
Partha Pratim
Das locking protocols. Deadlocks are a neces-
Objectives &
sary evil
Outline
• When a deadlock occurs there is a possi-
Concurrency
Control bility of cascading roll-backs
Lock-Based
Protocols
• Cascading roll-back is possible under two-
Example phase locking
Lock-Based
Protocols
Two-Phase Locking
• In the schedule here, each transaction ob-
Protocol
Lock Conversions
serves the two-phase locking protocol, but
Automatic
Acquisition of Locks
the failure of T5 after the read(A) step of
Deadlocks T7 leads to cascading rollback of T6 and
Starvation
Cascading T7.
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.24
More Two Phase Locking Protocols
Module 49
Partha Pratim • To avoid Cascading roll-back, follow a modified protocol called strict two-phase
Das
locking
Objectives &
Outline ◦ a transaction must hold all its exclusive locks till it commits/aborts
Concurrency
Control
• Rigorous two-phase locking is even stricter
Lock-Based ◦ All locks are held till commit/abort. In this protocol transactions can be serialized
Protocols
Example
in the order in which they commit
Lock-Based
Protocols • Note that concurrency goes down as we move to more and more strict locking protocol
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.25
Implementation of Locking PPD
Module 49
Partha Pratim
Das
Objectives &
Outline
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks
Deadlocks
Starvation
Implementation of Locking
Cascading
More Protocols
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.26
Implementation of Locking
Module 49
Partha Pratim • A lock manager can be implemented as a separate process to which transactions send
Das
lock and unlock requests
Objectives &
Outline • The lock manager replies to a lock request by sending a lock grant messages (or a
Concurrency message asking the transaction to roll back, in case of a deadlock)
Control
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.27
Lock Table
Module 49
• Dark blue rectangles indicate granted locks; light
Partha Pratim
Das blue indicate waiting requests
Objectives & • Lock table also records the type of lock granted or
Outline
requested
Concurrency
Control • New request is added to the end of the queue of
Lock-Based
Protocols requests for the data item, and granted if it is com-
Example patible with all earlier locks
Lock-Based
Protocols
Two-Phase Locking
• Unlock requests result in the request being deleted,
Protocol
Lock Conversions
and later requests are checked to see if they can
Automatic
Acquisition of Locks
now be granted
Deadlocks
Starvation
• If transaction aborts, all waiting or granted requests
Cascading of the transaction are deleted
More Protocols
Module 49
Concurrency
Control
Lock-Based
Protocols
Example
Lock-Based
Protocols
Two-Phase Locking
Protocol
Lock Conversions
Automatic
Acquisition of Locks Slides used in this presentation are borrowed from [Link] with kind
Deadlocks
Starvation permission of the authors.
Cascading
More Protocols
Edited and new slides are marked with “PPD”.
Implementation
of Locking
Lock Table
Module Summary
Database Management Systems Partha Pratim Das 49.29
Module 50
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Deadlock
Handling Module 50: Concurrency Control/2
Prevention
Detection
Recovery
Timestamp-
Based Partha Pratim Das
Protocols
Correctness
ppd@[Link]
Module 50
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Module 50
Partha Pratim • Deadlocks are perils of locking. We need to understand how to detect, prevent and
Das
recover from deadlock
Objectives &
Outline • Introduce a simple time-based protocol that avoids deadlocks
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Module 50
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Module 50
Partha Pratim
Das
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Deadlock Handling
Module 50
Partha Pratim • System is deadlocked if there is a set of transactions such that every transaction in the
Das
set is waiting for another transaction in the set
Objectives &
Outline • Deadlock Prevention protocols ensure that the system will never enter into a
Deadlock deadlock state. Some prevention strategies:
Handling
Prevention ◦ Require that each transaction locks all its data items before it begins execution
Detection
Recovery (pre-declaration)
Timestamp- ◦ Impose partial ordering of all data items and require that a transaction can lock
Based
Protocols data items only in the order specified by the partial order
Correctness
Module Summary
Module 50
• Transaction Timestamp: Timestamp is a unique identifier created by the DBMS to
Partha Pratim
Das
identify the relative starting time of a transaction. Timestamping is a method of
concurrency control in which each transaction is assigned a transaction timestamp
Objectives &
Outline
• Following schemes use transaction timestamps for the sake of deadlock prevention alone
Deadlock
Handling ◦ wait-die scheme: non-preemptive
Prevention
Detection . Older transaction may wait for younger one to release data item. (older means
Recovery
smaller timestamp)
Timestamp-
Based − Younger transactions never wait for older ones; they are rolled back instead
Protocols
Correctness . A transaction may die several times before acquiring needed data item
Module Summary
◦ wound-wait scheme: preemptive
. Older transaction wounds (forces rollback) of younger transaction instead of
waiting for it
− Younger transactions may wait for older ones
. May be fewer rollbacks than wait-die scheme
Database Management Systems Partha Pratim Das 50.7
Deadlock Prevention (2): Wait-Die Scheme
Module 50
Partha Pratim • Both in wait-die and in wound-wait schemes, a rolled back transaction is restarted with
Das
its original timestamp. Older transactions thus have precedence over newer ones, and
Objectives &
Outline
starvation is hence avoided
Deadlock • Timeout-Based Schemes
Handling
Prevention ◦ A transaction waits for a lock only for a specified amount of time. If the lock has
Detection
Recovery not been granted within that time, the transaction is rolled back and restarted
Timestamp- ◦ Thus, deadlocks are not possible
Based
Protocols ◦ Simple to implement; but starvation is possible. Also difficult to determine good
Correctness
value of the timeout interval
Module Summary
Module 50
Partha Pratim • Deadlocks can be described as a wait-for graph, which consists of a pair G = (V , E ),
Das
◦ V is a set of vertices (all the transactions in the system)
Objectives &
Outline ◦ E is a set of edges; each element is an ordered pair Ti → Tj .
Deadlock
Handling
• If Ti → Tj is in E , then there is a directed edge from Ti to Tj , implying that Ti is
Prevention waiting for Tj to release a data item
Detection
Recovery • When Ti requests a data item currently being held by Tj , then the edge Ti → Tj is
Timestamp-
Based
inserted in the wait-for graph. This edge is removed only when Tj is no longer holding
Protocols a data item needed by Ti
Correctness
Module Summary • The system is in a deadlock state if and only if the wait-for graph has a cycle
• Must invoke a deadlock-detection algorithm periodically to look for cycles
Module 50
Partha Pratim
Das
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Wait-for graph with a cycle
Timestamp-
Based
Wait-for graph without a cycle
Protocols
Correctness
Module Summary
Module 50
Module Summary
Module 50
Partha Pratim
Das
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Timestamp-Based Protocols
Module 50
Partha Pratim • Each transaction is issued a timestamp when it enters the system. If an old transaction
Das
Ti has time-stamp TS(Ti ), a new transaction Tj is assigned time-stamp TS(Tj ) such
Objectives &
Outline
that TS(Ti ) < TS(Tj ).
Deadlock • The protocol manages concurrent execution such that the time-stamps determine the
Handling
Prevention serializability order
Detection
Recovery • In order to assure such behavior, the protocol maintains for each data Q two timestamp
Timestamp- values:
Based
Protocols ◦ W-timestamp(Q) is the largest time-stamp of any transaction that executed
Correctness
Module Summary
write(Q) successfully
◦ R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully
Module 50
Partha Pratim • The timestamp ordering protocol ensures that any conflicting read and write
Das
operations are executed in timestamp order
Objectives &
Outline • Suppose a transaction Ti issues a read(Q)
Deadlock
Handling
a) If TS(Ti ) ≤ W-timestamp(Q), then Ti needs to read a value of Q that was already
Prevention overwritten
Detection
Recovery ◦ Hence, the read operation is rejected, and Ti is rolled back.
Timestamp-
Based
b) If TS(Ti ) ≥ W-timestamp(Q), then the read operation is executed, and
Protocols
Correctness
R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti )).
Module Summary
Module 50
Module 50
Partha Pratim
A partial schedule for several data items for transactions with timestamps 1, 2, 3, 4, 5
Das
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Module 50
Partha Pratim • The timestamp-ordering protocol guarantees serializability since all the arcs in the
Das
precedence graph are of the form:
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Thus, there will be no cycles in the precedence graph
Correctness
• Timestamp protocol ensures freedom from deadlock as no transaction ever waits
Module Summary
• But the schedule may not be cascade-free, and may not even be recoverable
Module 50
Partha Pratim • Explained how to detect, prevent and recover from deadlock
Das
• Introduced a time-based protocol that avoids deadlocks
Objectives &
Outline
Deadlock
Handling
Prevention
Detection
Recovery
Timestamp-
Based
Protocols
Correctness
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 51: Backup & Recovery/1: Backup/1
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types Partha Pratim Das
Backup
Strategies
Full Backup Department of Computer Science and Engineering
Incremental Backup Indian Institute of Technology, Kharagpur
Differential Backup
Example ppd@[Link]
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.1
Week Recap PPD
Module 51
Partha Pratim • Concurrent transactions, serializability issues, and ACID properties are discussed
Das
• Learnt the forms of serializability - conflict and view
Week Recap
Objectives &
• Conflict serializability can be ensured by acyclic precedence graph
Outline
What is Backup
• View Serializability is a weaker serializability system providing better concurrency.
and Recovery? However, testing for view serializability is NP complete
Why Backup?
• With proper planning, a database can be recovered back to a consistent state from
Backup Data:
Types inconsistent state in the face of system failures. Such a recovery is done via cascaded or
Backup
Strategies
cascadeless rollback
Full Backup
Incremental Backup
• Understood the locking mechanism and protocols
Differential Backup
Example
• Realized that deadlock is a peril of locking and needs to be handled through rollback
Case: Monthly • Explained how to detect, prevent and recover from deadlock
Schedule
Module Summary
Database Management Systems Partha Pratim Das 51.2
Module Objectives PPD
Module 51
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.3
Module Outline PPD
Module 51
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types References :
Backup
Strategies
• Enterprise Systems Backup and Recovery: A Corporate Insurance Policy by Preston De Guise (Accessed 21-Aug-2021)
Full Backup • Data Backup Recovery: The Essential Guide for Businesses (Accessed 19-Aug-2021)
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.4
What is Backup and Recovery? PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
What is Backup and Recovery?
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.5
What is Backup and Recovery? PPD
Module 51
• A Backup of a database is a representative copy of data containing all necessary
Partha Pratim
Das
contents of a database such as data files and control files
Week Recap
◦ Unexpected database failures, especially those due to factors beyond our control,
Objectives &
are unavoidable. Hence, it is important to keep a backup of the entire database
Outline
◦ There are two major types of backup:
What is Backup
and Recovery? ▷ Physical Backup: A copy of physical database files such as data, control files,
Why Backup? log files, and archived redo logs.
Backup Data:
Types
▷ Logical Backup: A copy of logical data that is extracted from a database
Backup
consisting of tables, procedures, views, functions, etc.
Strategies
Full Backup • Recovery is the process of restoring the database to its latest known consistent state
Incremental Backup
Differential Backup
after a system failure occurs.
Example
◦ A Database Log records all transactions in a sequence. Recovery using logs is quite
Case: Monthly
Schedule popular in databases
Hot Backup ◦ A typical log file contains information about transactions to execute, transaction
Transactional
Logging states, and modified values
Module Summary
Database Management Systems Partha Pratim Das 51.6
Why Backup? PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Why Backup?
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.7
Why is backup necessary? PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Backup Data: Types
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.9
Types of Backup Data PPD
Module 51
Partha Pratim • Business Data includes personal information of clients, employees, contractors etc.
Das
along with details about places, things, events and rules related to the business.
Week Recap
Objectives &
Outline • System Data includes specific environment/configuration of the system used for
What is Backup specialised development purposes, log files, software dependency data, disk images.
and Recovery?
Why Backup?
Backup Data: • Media files like photographs, videos, sounds, graphics etc. need backing up. Media
Types
files are typically much larger in size.
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.10
Backup Strategies PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Backup Strategies
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.11
Types of Backup Strategies: Full Backup PPD
Module 51
• Full Backup backs up everything. This is a complete copy, which stores all the objects
Partha Pratim
Das
of the database: tables, procedures, functions, views, indexes etc. Full backup can
restore all components of the database system as it was at the time of crash.
Week Recap
Objectives & • A full backup must be done at least once before any of the other type of backup
Outline
What is Backup
• The frequency of a full backup depends on the type of application. For instance, a full
and Recovery? backup is done on a daily basis for applications in which one or more of the following
Why Backup?
is/are true:
Backup Data:
Types ◦ Either 24/7 availability is not a requirement, or system availability is not affected as
Backup
Strategies
a consequence of backups.
Full Backup ◦ A complete backup takes a minimum amount of media, i.e. the backup data is not
Incremental Backup
Differential Backup
too large.
Example ◦ Backup/system administrators may not be available on a daily basis, and therefore a
Case: Monthly
Schedule
primary goal is to reduce to a bare minimum the amount of media required to
Hot Backup complete a restore.
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.12
Types of Backup Strategies: Full Backup (2) PPD
Module 51
Why Backup?
◦ It is relatively easy to setup, configure and maintain
Backup Data: • Full Backup: Disadvantages
Types
Backup
◦ The backup takes largest amount of time among all types of backups
Strategies
Full Backup
◦ This results in longest system downtime during the backup process
Incremental Backup ◦ It uses largest amount of storage media per backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.13
Types of Backup Strategies: Incremental Backup PPD
Module 51
• Incremental backup targets only those files or items that have changed since the last
Partha Pratim
Das
backup. This often results in smaller backups and needs shorter duration to complete
the backup process.
Week Recap
Objectives & • For instance, a 2 TB database may only have a 5% change during the day. With
Outline
incremental database backups, the amount backed up is typically only a little more than
What is Backup
and Recovery? the actual amount of changed data in the database.
Why Backup?
• For most organizations, a full backup is done once a week, and incremental
Backup Data:
Types backups are done for the rest of the time. This might mean a backup schedule as
Backup shown below
Strategies
Full Backup
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
• This ensures a minimum backup window during peak activity times, with a longer
Hot Backup
backup window during non-peak activity times.
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.14
Types of Backup Strategies: Incremental Backup (2) PPD
Module 51
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.15
Types of Backup Strategies: Differential Backup PPD
Module 51
Partha Pratim • Differential backup backs up all the changes that have occurred since the most recent
Das
full backup regardless of what backups have occurred in between
Week Recap
• This “rolls up” multiple changes into a single backup job which sets the basis for the
Objectives &
Outline next incremental backup
What is Backup
and Recovery?
◦ As a differential backup does not back up everything, this backup process usually
Why Backup? runs quicker than a full backup
Backup Data: ◦ The longer the age of a differential backup, the larger the size of its backup window
Types
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.16
Types of Backup Strategies: Differential Backup (2) PPD
Module 51
Partha Pratim
• To evaluate how differential backups might work within an environment, consider the sample backup
Das schedule shown in the figure below.
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
a) The incremental backup on Saturday backs up all files that have changed since the full backup on
Friday. Likewise all changes since Saturday and Sunday is backed up on Sunday and Monday’s
Why Backup?
incremental backup respectively.
Backup Data:
Types b) On Tuesday, a differential backup is performed. This backs up all files that have changed since the
Backup
full backup on Friday. A recovery on Wednesday should only require data from the full and
Strategies differential backups, skipping the Saturday/Sunday/Monday incremental backups.
Full Backup
Incremental Backup Recovery on any given day only needs the data from the full backup and the most recent differential backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.17
Types of Backup strategies: Differential Backup (3) PPD
Module 51
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.18
Types of Backup Strategies: Illustrative Example PPD
Module 51
Partha Pratim • The figure below depicts which of the updated files of the database will be backed up in
Das
each respective type of backup throughout a span of 5 days as indicated.
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup Figure: Backup Types
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.19
Case: Monthly Schedule PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Case: Monthly Schedule
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.20
Case: Monthly Data Backup Schedule PPD
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
• Inference
Why Backup? ◦ Here full backups are performed once per month, but with differentials being performed weekly, the
Backup Data: maximum number of backups required for a complete system recovery at any point will be one full
Types backup, one differential backup, and six incremental backups
Backup ◦ A full system recovery will never need more than the full backup from the start of the month, the
Strategies
Full Backup
differential backup at the start of the relevant week, and the incremental backups performed during
Incremental Backup the week
Differential Backup ◦ If a policy were used whereby full backups were done on the first of the month, and incrementals
Example
for the rest of the month, a complete system recovery on last day of month will need as many as
Case: Monthly 31 backup sets
Schedule
◦ Thus differential backups can improve efficiency of recovery when planned properly
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.21
Hot Backup PPD
Module 51
Partha Pratim
Das
Week Recap
Objectives &
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Hot Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.22
Hot Backup PPD
Module 51
• Till now we have learnt about backup strategies which can not happen simultaneously
Partha Pratim
Das
with a running application
Week Recap
• In systems where high availability is a requirement Hot backup is preferable wherever
Objectives & possible
Outline
What is Backup
• Hot backup refers to keeping a database up and running while the backup is
and Recovery? performed concurrently
Why Backup?
◦ Such a system usually has a module or plug-in that allows the database to be
Backup Data:
Types backed up while staying available to end users
Backup
Strategies
◦ Databases which stores transactions of asset management companies, hedge funds,
Full Backup high frequency trading companies etc. try to implement Hot backups as these data
Incremental Backup
Differential Backup
are highly dynamic and the operations run 24x7
Example ◦ Real time systems like sensor and actuator data in embedded devices, satellite
Case: Monthly
Schedule
transmissions etc. also use Hot backup
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.23
Hot Backup (2) PPD
Module 51
Backup Data:
◦ May not be feasible when the data set is huge and monolithic.
Types
◦ Fault tolerance is less. Occurrence of any error on the fly can terminate the whole
Backup
Strategies backup process.
Full Backup
Incremental Backup
◦ Maintenance and setup cost is high.
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.24
Transactional Logging as Hot Backup PPD
Module 51
Partha Pratim • In regular database systems, hot backup is mainly used for Transaction Log Backup.
Das
• Cold backup strategies like Differential, Incremental are preferred for Data backup.
Week Recap
The reason is evident from the disadvantages of Hot backup.
Objectives &
Outline
• Transactional Logging is used in circumstances where a possibly inconsistent backup
What is Backup
and Recovery? is taken, but another file generated and backed up (after the database file has been
Why Backup? fully backed up) can be used to restore consistency.
Backup Data:
Types • The information regarding data backup versions while recovery at a given point can
Backup be inferred from the Transactional Log backup set.
Strategies
Full Backup • Thus they play a vital role in database recovery.
Incremental Backup
Differential Backup
Example
Case: Monthly
Schedule
Hot Backup
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.25
Module Summary PPD
Module 51
Objectives &
• Learnt how Hot backup of transaction log helps in recovering consistent database
Outline
What is Backup
and Recovery?
Why Backup?
Backup Data:
Types
Backup
Strategies
Full Backup
Incremental Backup
Differential Backup
Example Slides used in this presentation are borrowed from [Link] with kind
Case: Monthly
Schedule
permission of the authors.
Hot Backup
Edited and new slides are marked with “PPD”.
Transactional
Logging
Module Summary
Database Management Systems Partha Pratim Das 51.26
Module 52
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Failure
Classification Module 52: Backup & Recovery/2: Recovery/1
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Partha Pratim Das
Database
Modification
Undo and Redo Department of Computer Science and Engineering
Example Indian Institute of Technology, Kharagpur
Checkpoints
Module 52
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
Partha Pratim • We need to understand what are the possible sources for failure for transactions in a
Das
database
Objectives &
Outline • Various types of storages are used for recovery from failures to ensure Atomicity,
Failure Consistency and Durability – these models need to be explored
Classification
Module Summary
Module 52
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Failure Classification
Checkpoints
Module Summary
Module 52
• All database reads/writes are within a transaction
Partha Pratim
Das • Transactions have the “ACID” properties
Objectives & ◦ Atomicity - all or nothing
Outline
Failure
◦ Consistency - preserves database integrity
Classification ◦ Isolation - execute as if they were run alone
Storage Structure ◦ Durability - results are not lost by a failure
Implementation
Data Access
• Concurrency Control guarantees I, contributes to C
Log-Based
Recovery • Application program guarantees C
Database
Modification
Undo and Redo
• Recovery subsystem guarantees A & D, contributes to C
Example
Checkpoints
Module Summary
Module 52
• Transaction failure:
Partha Pratim
Das ◦ Logical errors: transaction cannot complete due to some internal error condition
Objectives &
◦ System errors: the database system must terminate an active transaction due to
Outline
an error condition (for example, deadlock)
Failure
Classification • System crash: a power failure or other hardware or software failure causes the system
Storage Structure to crash
Implementation
Data Access ◦ Fail-stop assumption: non-volatile storage contents are assumed to not be
Log-Based
Recovery
corrupted as result of a system crash
Database
Modification ▷ Database systems have numerous integrity checks to prevent corruption of disk
Undo and Redo
Example
data
Checkpoints
• Disk failure: a head crash or similar disk failure destroys all or part of disk storage
Module Summary
◦ Destruction is assumed to be detectable
▷ Disk drives use checksums to detect failures
Module 52
Partha Pratim • Consider transaction Ti that transfers $50 from account A to account B
Das
◦ Two updates: subtract 50 from A and add 50 to B
Objectives &
Outline • Transaction Ti requires updates to A and B to be output to the database
Failure
Classification ◦ A failure may occur after one of these modifications have been made but before
Storage Structure both of them are made
Implementation
Data Access
◦ Modifying the database without ensuring that the transaction will commit may
Log-Based leave the database in an inconsistent state
Recovery
Database
◦ Not modifying the database may result in lost updates if failure occurs just after
Modification
Undo and Redo
transaction commits
Example
Checkpoints
• Recovery algorithms have two parts
Module Summary a) Actions taken during normal transaction processing to ensure enough information
exists to recover from failures
b) Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability
Database Management Systems Partha Pratim Das 52.8
Storage Structure PPD
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Storage Structure
Checkpoints
Module Summary
Module 52
Module Summary
Module 52
Partha Pratim
Protecting storage media from failure during data transfer (cont.):
Das
• Copies of a block may differ due to failure during output operation
Objectives &
Outline • To recover from failure:
Failure
Classification
◦ First find inconsistent blocks:
Storage Structure ▷ Expensive solution : Compare the two copies of every disk block
Implementation
Data Access
▷ Better solution:
Log-Based − Record in-progress disk writes on non-volatile storage (Non-volatile RAM or
Recovery
Database special area of disk)
Modification
Undo and Redo
− Use this information during recovery to find blocks that may be
Example
Checkpoints
inconsistent, and only compare copies of these
Module Summary
− Used in hardware RAID systems
◦ If either copy of an inconsistent block is detected to have an error (bad checksum),
overwrite it by the other copy
◦ If both have no error, but are different, overwrite the second block by the first block
Database Management Systems Partha Pratim Das 52.12
Data Access
Module 52
Partha Pratim • Physical Blocks are those blocks residing on the disk
Das
• System Buffer Blocks are the blocks residing temporarily in main memory
Objectives &
Outline • Block movements between disk and main memory are initiated through the following
Failure
Classification
two operations:
Storage Structure ◦ input(B) transfers the physical block B to main memory
Implementation
Data Access
◦ output(B) transfers the buffer block B to the disk, and replaces the appropriate
Log-Based physical block there
Recovery
Database
Modification
• We assume, for simplicity, that each data item fits in, and is stored inside, a single block
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• Each transaction Ti has its private work-area in which local copies of all data items
Partha Pratim
Das accessed and updated by it are kept
Objectives &
◦ Ti ’s local copy of a data item X is denoted by xi
Outline
◦ BX denotes block containing X
Failure
Classification • Transferring data items between system buffer blocks and its private work-area done by:
Storage Structure
Implementation
◦ read(X) assigns the value of data item X to the local variable xi
Data Access ◦ write(X) assigns the value of local variable xi to data item X in the buffer block
Log-Based
Recovery • Transactions
Database
Modification ◦ Must perform read(X) before accessing X for the first time (subsequent reads can
Undo and Redo
Example be from local copy)
Checkpoints
◦ The write(X) can be executed at any time before the transaction commits
Module Summary
• Note that output(BX ) need not immediately follow write(X). System can perform the
output operation when it deems fit
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• To ensure atomicity despite failures, we first output information describing the
Partha Pratim
Das modifications to stable storage without modifying the database itself
Objectives & • We study Log-based Recovery Mechanisms
Outline
Failure
◦ We first present key concepts
Classification ◦ And then present the actual recovery algorithm
Storage Structure
Implementation
• Less used alternative: Shadow Paging
Data Access
• In this Module we assume serial execution of transactions
Log-Based
Recovery
Database
• In the next Module, we consider the case of concurrent transaction execution
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Log-Based Recovery
Checkpoints
Module Summary
Module 52
• A log is kept on stable storage
Partha Pratim
Das ◦ The log is a sequence of log records, which maintains information about update
Objectives &
activities on the database
Outline
• When transaction Ti starts, it registers itself by writing a record < Ti start > to the log
Failure
Classification • Before Ti executes write(X), a log record < Ti , X , V1 , V2 > is written, where V1 is the
Storage Structure
Implementation
value of X before the write (old value), and V2 is the value to be written to X (new
Data Access value)
Log-Based
Recovery • When Ti finishes its last statement, the log record < Ti commit > is written
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• The immediate-modification scheme allows updates of an uncommitted transaction
Partha Pratim
Das to be made to the buffer, or the disk itself, before the transaction commits
Objectives &
◦ Update log record must be written before a database item is written
Outline
▷ We assume that the log record is output directly to stable storage
Failure
Classification ◦ Output of updated blocks to disk storage can take place at any time before or after
Storage Structure
Implementation
transaction commit
Data Access ◦ Order in which blocks are output can be different from the order in which they are
Log-Based
Recovery
written
Database
Modification • The deferred-modification scheme performs updates to buffer/disk only at the time of
Undo and Redo
Example
transaction commit
Checkpoints
◦ Simplifies some aspects of recovery
Module Summary
◦ But has overhead of storing local copy
• We cover here only the immediate-modification scheme
Module 52
Partha Pratim • A transaction is said to have committed when its commit log record is output to stable
Das
storage
Objectives &
Outline ◦ All previous log records of the transaction must have been output already
Failure
Classification
• Writes performed by a transaction may still be in the buffer when the transaction
Storage Structure
commits, and may be output later
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• Undo of a log record < Ti , X , V1 , V2 > writes the old value V1 to X
Partha Pratim
Das • Redo of a log record < Ti , X , V1 , V2 > writes the new value V2 to X
Objectives &
Outline
• Undo and Redo of Transactions
Failure ◦ undo(Ti ) restores the value of all data items updated by Ti to their old values,
Classification
going backwards from the last log record for Ti
Storage Structure
Implementation ▷ Each time a data item X is restored to its old value V a special log record
Data Access
(called redo-only) < Ti , X , V > is written out
Log-Based
Recovery ▷ When undo of a transaction is complete, a log record < Ti abort> is written
Database
Modification out (to indicate that the undo was completed)
Undo and Redo
Example ◦ redo(Ti ) sets the value of all data items updated by Ti to the new values, going
Checkpoints
forward from the first log record for Ti
Module Summary
▷ No logging is done in this case
Module 52
Partha Pratim • The undo and redo operations are used in several different circumstances:
Das
◦ The undo is used for transaction rollback during normal operation
Objectives &
Outline ▷ in case a transaction cannot complete its execution due to some logical error
Failure
Classification
◦ The undo and redo operations are used during recovery from failure
Storage Structure • We need to deal with the case where during recovery from failure another failure occurs
Implementation
Data Access prior to the system having fully recovered
Log-Based
Recovery
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• Let Ti be the transaction to be rolled back
Partha Pratim
Das • Scan log backwards from the end, and for each log record of Ti of the form
Objectives & < Ti , Xj , V1 , V2 >
Outline
Failure
◦ Perform the undo by writing V1 to Xj ,
Classification ◦ Write a log record < Ti , Xj , V1 >
Storage Structure
Implementation
▷ such log records are called Compensation Log Records
Data Access
• Once the record < Ti start> is found stop the scan and write the log record < Ti
Log-Based
Recovery abort>
Database
Modification
Undo and Redo
Example
Checkpoints
Module Summary
Module 52
• When recovering after failure:
Partha Pratim
Das ◦ Transaction Ti needs to be undone if the log
Objectives & ▷ contains the record < Ti start>,
Outline
▷ but does not contain either the record < Ti commit> or < Ti abort>
Failure
Classification ◦ Transaction Ti needs to be redone if the log
Storage Structure
Implementation
▷ contains the records < Ti start>
Data Access
▷ and contains the record < Ti commit > or < Ti abort >
Log-Based
Recovery ◦ It may seem strange to redo transaction Ti if the record < Ti abort> record is in
Database
Modification the log
Undo and Redo
Example ▷ To see why this works, note that if < Ti abort> is in the log, so are the
Checkpoints
redo-only records written by the undo operation. Thus, the end result will be to
Module Summary
undo Ti ’s modifications in this case. This slight redundancy simplifies the
recovery algorithm and enables faster overall recovery time
▷ such a redo redoes all the original actions including the steps that restored old
value – Known as Repeating History
Database Management Systems Partha Pratim Das 52.25
Immediate Modification Recovery Example
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database Recovery actions in each case above are:
Modification
Undo and Redo (a) undo (T0 ): B is restored to 2000 and A to 1000, and log records < T0 , B, 2000 >,
Example
Checkpoints < T0 , A, 1000 >, < T0 , abort> are written out
Module Summary (b) redo (T0 ) and undo (T1 ): A and B are set to 950 and 2050 and C is restored to 700.
Log records < T1 , C, 700 >, < T1 , abort> are written out
(c) redo (T0 ) and redo (T1 ): A and B are set to 950 and 2050 respectively. Then C is set
to 600.
Database Management Systems Partha Pratim Das 52.26
Checkpoints
Module 52
• Redoing/undoing all transactions recorded in the log can be very slow
Partha Pratim
Das ◦ Processing the entire log is time-consuming if the system has run for a long time
Objectives &
◦ We might unnecessarily redo transactions which have already output their updates
Outline
to the database
Failure
Classification • Streamline recovery procedure by periodically performing checkpointing
Storage Structure
Implementation
• All updates are stopped while doing checkpointing
Data Access
a) Output all log records currently residing in main memory onto stable storage
Log-Based
Recovery b) Output all modified buffer blocks to the disk
Database
Modification c) Write a log record < checkpoint L > onto stable storage where L is a list of all
Undo and Redo
Example
transactions active at the time of checkpoint
Checkpoints
Module Summary
Module 52
• During recovery we need to consider only the most recent transaction Ti that started
Partha Pratim
Das before the checkpoint, and transactions that started after Ti
Objectives &
◦ Scan backwards from end of log to find the most recent <checkpoint L > record
Outline
◦ Only transactions that are in L or started after the checkpoint need to be redone or
Failure
Classification undone
Storage Structure ◦ Transactions that committed or aborted before the checkpoint already have all their
Implementation
Data Access
updates output to stable storage
Log-Based • Some earlier part of the log may be needed for undo operations
Recovery
Database
Modification
◦ Continue scanning backwards till a record < Ti start> is found for every
Undo and Redo transaction Ti in L
Example
Checkpoints ◦ Parts of log prior to earliest < Ti start> record above are not needed for recovery,
Module Summary and can be erased whenever desired
Module 52
Partha Pratim
Das
Objectives &
Outline
Failure
Classification
Storage Structure
Implementation
Data Access
Log-Based
Recovery • Any transactions that committed before the last checkpoint should be ignored
Database
Modification
Undo and Redo
◦ T1 can be ignored (updates already output to disk due to checkpoint)
Example
Checkpoints
• Any transactions that committed since the last checkpoint need to be redone
Module Summary ◦ T2 and T3 redone
• Any transaction that was running at the time of failure needs to be undone and
restarted
◦ T4 undone
Database Management Systems Partha Pratim Das 52.29
Module Summary
Module 52
Partha Pratim • Failures may be due to variety of sources – each needs a strategy for handling
Das
• A proper mix and management of volatile, non-volatile and stable storage can
Objectives &
Outline guarantee recovery from failures and ensure Atomicity, Consistency and Durability
Failure
Classification
• Log-based recovery is efficient and effective
Storage Structure
Implementation
Data Access
Log-Based
Recovery
Database
Modification
Undo and Redo
Example Slides used in this presentation are borrowed from [Link] with kind
Checkpoints
permission of the authors.
Module Summary
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Transactional
Logging Module 53: Backup & Recovery/3: Recovery/2
Hot Backup
Example
Recovery
Algorithm
Data Access Partha Pratim Das
Checkpoint
Redo Phase
Undo Phase Department of Computer Science and Engineering
Example Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 53
Partha Pratim • Failures may be due to variety of sources – each needs a strategy for handling
Das
• A proper mix and management of volatile, non-volatile and stable storage can
Objectives &
Outline guarantee recovery from failures and ensure Atomicity, Consistency and Durability
Transactional
Logging
• Log-based recovery is efficient and effective
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Transactional Logging
Module Summary
Module 53
• In systems where high availability is a requirement Hot backup is preferable wherever
Partha Pratim
Das
possible
Objectives &
• Hot backup refers to keeping a database up and running while the backup is
Outline
performed concurrently
Transactional
Logging ◦ Such a system usually has a module or plug-in that allows the database to be
Hot Backup
Example
backed up while staying available to end users
Recovery ◦ Databases which stores transactions of asset management companies, hedge funds,
Algorithm
Data Access
high frequency trading companies etc. try to implement Hot backups as these data
Checkpoint are highly dynamic and the operations run 24x7
Redo Phase
Undo Phase ◦ Real time systems like sensor and actuator data in embedded devices, satellite
Example
transmissions etc. also use Hot backup
Module Summary
Module 53
Partha Pratim • In regular database systems, Hot Backup is mainly used for Transaction Log Backup
Das
• Cold backup strategies like Differential, Incremental are preferred for Data backup
Objectives &
Outline The reason is evident from the disadvantages of Hot backup
Transactional
Logging
• Transactional Logging is used in circumstances where a possibly inconsistent backup
Hot Backup is taken, but another file generated and backed up (after the database file has been
Example
Recovery
fully backed up) can be used to restore consistency
Algorithm
Data Access
• The information regarding data backup versions while recovery at a given point can
Checkpoint be inferred from the Transactional Log backup set
Redo Phase
Undo Phase
• Thus they play a vital role in database recovery
Example
Module Summary
Module 53 To understand how Transactional Logging works we consider Figure 1 that represents a chunk of a
Partha Pratim
database just before a backup has been started
Das
Objectives &
• While the backup is in progress, modifications may continue
Outline to occur to the database. For example, a request to modify
Transactional the data at location “4325” to ‘0’ arrives.
Logging
Hot Backup
• When a request comes through to modify a part of the DB, the
Example modifications will be written in the given order compulsorily
Recovery Figure: 1: Database content 1 Transaction Log
Algorithm 2 Database (itself)
Data Access
Checkpoint This is depicted in Figure 2
Redo Phase
Undo Phase • If a crash occurs before writing to the database then the
Example inconsistent backed up file is recovered first, and then the
Module Summary pending modifications in the transaction log (backed up*)
are applied to re-establish consistency
*Note: The Transactional Log itself is backed up using Hot
Backup the Data is backed up incrementally
Module 53 Consider in the previous scenario before the occurrence of crash, another request modifies the content of
location “4321” to ‘0’. Incidentally, this change gets written in the database itself (recall: Immediate
Partha Pratim
Das Modification). This is indicated in Figure 3
Objectives &
• Figure 3 is the state of the database after which the system
Outline crashes. Note that this part has already been backed up, and
Transactional hence, the backup is inconsistent with the database.
Logging
Hot Backup • Recovery Phase:
Example
◦ Data recovery is done from the last data back up set (Fig-
Recovery
Algorithm ure 1)
Data Access ◦ Log recovery is done from the Transaction Log backup set.
Checkpoint
Figure: 3: Applying Tr. logs during recovery It will be same as the current transaction log because of
Redo Phase
Undo Phase
Hot backup
Example ◦ Figure 4 shows the recovered database and log
Module Summary • The recovered database is inconsistent. To re-establish con-
sistency all transaction logs generated between the start of
the backup and the end of the backup must be replayed
Module 53 • When using transactional logging we distinguish between recover and restore:
Partha Pratim
Das
◦ Recover: retrieve from the backup media the database files and transaction logs, and
◦ Restore: reapply database consistency based on the transaction logs
Objectives &
Outline • For our restore process, we recover inconsistent database files and completed transaction logs. The
Transactional
recovered files will resemble the configuration shown in Figure 4
Logging
Hot Backup
• The final database state after replaying log on the recovered database is displayed in Figure 5
Example
• The state of database is consistent
Recovery
Algorithm
Data Access
• Note that an unnecessary log replay is shown occurring
Checkpoint for block 4325. Whether such replays will occur is de-
Redo Phase pendent on the database being used. For instance, a
Undo Phase
Example
database vendor might choose to replay all logs because
it would be faster than first determining whether a par-
Module Summary
ticular logged activity needs to be replayed
• Once all transaction logs have been replayed, the
database is said to have been restored, that is, it is at a
Figure: 5: Database restore process via log replay point where it can now be opened for user access
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Recovery Algorithm
Module Summary
Module 53
Module Summary
Module 53
• With concurrent transactions, all transactions share a single disk buffer and a single log
Partha Pratim
Das ◦ A buffer block can have data items updated by one or more transactions
Objectives & • We assume that if a transaction Ti has modified an item, no other transaction can
Outline
Transactional
modify the same item until Ti has committed or aborted
Logging
Hot Backup
◦ That is, the updates of uncommitted transactions should not be visible to other
Example transactions
Recovery
Algorithm ▷ Otherwise how do we perform undo if T1 updates A, then T2 updates A and
Data Access
commits, and finally T1 has to abort?
Checkpoint
Redo Phase ◦ Can be ensured by obtaining exclusive locks on updated items and holding the locks
Undo Phase
Example till end of transaction (strict two-phase locking)
Module Summary
• Log records of different transactions may be interspersed in the log
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Module Summary
Module 53
• Let the time of checkpointing is tcheck and the time of system crash is tfail
Partha Pratim
• Let there be four transactions Ta , Tb , Tc and Td such that:
Das
◦ Ta commits before checkpoint
Objectives & ◦ Tb starts before checkpoint and commits before system crash
Outline
◦ Tc starts after checkpoint and commits before system crash
Transactional
Logging
◦ Td starts after checkpoint and was active at the time of system crash
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
• The actions that are taken by the recovery manager are:
◦ Nothing is done with Ta
◦ Transaction redo is performed for Tb and Tc
◦ Transaction undo is performed for Td
Source: Distributed DBMS - Database Recovery
Database Management Systems Partha Pratim Das 53.18
Recovery Algorithm (4): Checkpoints Recap
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
• Any transactions that committed before the last checkpoint should be ignored
Redo Phase
Undo Phase
◦ T1 can be ignored (updates already output to disk due to checkpoint)
Example
• Any transactions that committed since the last checkpoint need to be redone
Module Summary
◦ T2 and T3 redone
• Any transaction that was running at the time of failure needs to be undone and
restarted
◦ T4 undone
Database Management Systems Partha Pratim Das 53.19
Recovery Algorithm (5): Redo-Undo Phases
Module 53
• Recovery from failure: Two phases
Partha Pratim
Das ◦ Redo phase: Replay updates of all transactions, whether they committed, aborted,
Objectives &
or are incomplete
Outline
◦ Undo phase: Undo phase: Undo all incomplete transactions
Transactional
Logging
Hot Backup
Example
Requirement:
Recovery • Transactions of type T1 need no recovery
Algorithm
Data Access • Transactions of type T2 or T4 need to be re-
Checkpoint done
Redo Phase
Undo Phase • Transactions of type T3 or T5 need to be un-
Example done and restarted
Module Summary
Strategy:
• Ignore T1
• Redo T2 , T3 , T4 and T5
• Undo T3 and T5
Database Management Systems Partha Pratim Das 53.20
Recovery Algorithm (6): Redo Phase
Module 53 • Find last < checkpoint L> record, and set undo-list to L
Partha Pratim
Das
• Scan forward from above < checkpoint L> record
◦ Whenever a record < Ti , Xj , V1 , V2 > is found, redo it by writing V2 to Xj
Objectives &
Outline ◦ Whenever a log record < Ti start> is found, add Ti to undo-list
Transactional
◦ Whenever a log record < Ti commit> or < Ti abort> is found, remove Ti from undo-list
Logging
Hot Backup
• Steps for the REDO operation are:
Example
◦ If the transaction has done INSERT, the recovery manager generates an insert from the log
Recovery
Algorithm
◦ If the transaction has done DELETE, the recovery manager generates a delete from the log
Data Access
◦ If the transaction has done UPDATE, the recovery manager generates an update from the log.
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Partha Pratim
Das
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Module Summary
Module 53
Partha Pratim • Learnt how Hot backup of transaction log helps in recovering consistent database.
Das
• Studied the recovery algorithms for concurrent transactions
Objectives &
Outline
Transactional
Logging
Hot Backup
Example
Recovery
Algorithm
Data Access
Checkpoint
Redo Phase
Undo Phase
Example
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Recovery with
Early Lock Module 54: Backup & Recovery/4: Recovery/3
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Partha Pratim Das
Plan for Backup
and Recovery
ppd@[Link]
Module 54
Partha Pratim • Learnt how Hot backup of transaction log helps in recovering consistent database
Das
• Studied the recovery algorithms for concurrent transactions
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
References :
Plan for Backup
and Recovery • Enterprise Systems Backup and Recovery: A Corporate Insurance Policy by Preston De Guise
Module Summary • [Link] (Accessed 19-Aug-2021)
• [Link] (Accessed 19-Aug-2021)
Module 54
Partha Pratim
Das
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Recovery with Early Lock Release
Module 54
• Any index used in processing a transaction, such as a B+ -tree, can be treated as normal
Partha Pratim
Das data
Objectives & • To increase concurrency, the B+ -tree concurrency control algorithm often allow locks to
Outline
be released early, in a non-two-phase manner
Recovery with
Early Lock
Release
• As a result of early lock release, it is possible that
Operation Logging
Transaction Rollback
◦ a value in a B+ -tree node is updated by one transaction T1 , which inserts an entry
Failure Recovery (V1 , R1 ), and subsequently
Recovery Algorithm
◦ by another transaction T2 , which inserts an entry (V2 , R2 ) in the same node,
Plan for Backup
and Recovery moving the entry (V1 , R1 ) even before T1 completes execution
Module Summary
• At this point, we cannot undo transaction T1 by replacing the contents of the node with
the old value prior to T1 performing its insert, since that would also undo the insert
performed by T2 ; transaction T2 may still commit (or may have already committed)
• Hence, the only way to undo the effect of insertion of (V1 , R1 ) is to execute a
corresponding delete operation
Module 54 • Support for high-concurrency locking techniques, such as those used for B + -tree
Partha Pratim
Das
concurrency control, which release locks early
◦ Supports “logical undo”
Objectives &
Outline
• Recovery based on “repeating history”, whereby recovery executes exactly the same
Recovery with
Early Lock actions as normal processing
Release
Operation Logging ◦ including redo of log records of incomplete transactions, followed by subsequent
Transaction Rollback
Failure Recovery
undo
Recovery Algorithm ◦ Key benefits
Plan for Backup
and Recovery ▷ supports logical undo
Module Summary ▷ easier to understand/show correctness
• Early lock release is important not only for indices, but also for operations on other
system data structures that are accessed and updated very frequently like:
◦ data structures that track the blocks containing records of a relation
◦ the free space in a block
◦ the free blocks
Database Management Systems Partha Pratim Das 54.7
Logical Undo Logging
Module 54
• Operations like B + -tree insertions and deletions release locks early
Partha Pratim
Das ◦ They cannot be undone by restoring old values (physical undo), since once a lock
Objectives &
is released, other transactions may have updated the B + -tree
Outline
◦ Instead, insertions (deletions) are undone by executing a deletion (insertion)
Recovery with
Early Lock operation (known as logical undo)
Release
Operation Logging • For such operations, undo log records should contain the undo operation to be executed
Transaction Rollback
Failure Recovery ◦ Such logging is called logical undo logging, in contrast to physical undo logging
Recovery Algorithm
Module 54
Partha Pratim • Redo information is logged physically (that is, new value for each write) even for
Das
operations with logical undo
Objectives &
Outline ◦ Logical redo is very complicated since database state on disk may not be “operation
Recovery with consistent” when recovery starts
Early Lock
Release ◦ Physical redo logging does not conflict with early lock release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54 • When operation starts, log < Ti , Oj , operation-begin >. Here Oj is a unique identifier
Partha Pratim
Das
of the operation instance
Objectives &
• While the system is executing the operation, it creates update log records in the normal
Outline fashion for all updates performed by the operation
Recovery with
Early Lock ◦ the usual old-value (physical undo information) and new-value (physical redo
Release
Operation Logging
information) is written out as usual for each update performed by the operation;
Transaction Rollback ◦ the old-value information is required in case the transaction needs to be rolled back
Failure Recovery
Recovery Algorithm before the operation completes
Plan for Backup
and Recovery
• When operation completes, < Ti , Oj , operation-end, U > is logged, where U contains
Module Summary information needed to perform a logical undo information
◦ For example, if the operation inserted an entry in a B+ -tree, the undo information
U would indicate that a deletion operation is to be performed, and would identify
the B+ -tree and what entry to delete from the tree. This is called logical logging
◦ In contrast, logging of old-value and new-value information is called physical
logging, and the corresponding log records are called physical log records
Database Management Systems Partha Pratim Das 54.10
Operation Logging (2): Example
Module 54
Partha Pratim • Insert of (key, record-id) pair (K5, RID7) into index I9
Das
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54
• If crash/rollback occurs before operation completes:
Partha Pratim
Das ◦ the operation-end log record is not found, and
Objectives &
◦ the physical undo information is used to undo operation
Outline
Recovery with
• If crash/rollback occurs after the operation completes:
Early Lock
Release
◦ the operation-end log record is found, and in this case
Operation Logging ◦ logical undo is performed using U; the physical undo information for the operation
Transaction Rollback
Failure Recovery is ignored
Recovery Algorithm
Module Summary
Module 54
Partha Pratim
Das
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54
Partha Pratim
Das
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Module 54
• Example with a complete and an incomplete operation
Partha Pratim
Das < T 1, start >
Objectives & < T 1, O1, operation-begin>
Outline
...
Recovery with
Early Lock < T 1, X , 10, K 5 >
Release
Operation Logging
< T 1, Y , 45, RID7 >
Transaction Rollback
< T 1, O1, operation-end, (delete I 9, K 5, RID7) >
Failure Recovery
Recovery Algorithm < T 1, O2, operation-begin>
Plan for Backup
and Recovery
< T 1, Z , 45, 70 >
Module Summary
← T1 Rollback begins here
< T 1, Z , 45 > ← Redo-only log record during physical undo (of incomplete O2)
< T 1, Y ,. . . ,. . . > ← Normal redo records for logical undo of O1
...
< T 1, O1, operation-abort> ← What if crash occurred immediately after this?
< T 1, abort >
Database Management Systems Partha Pratim Das 54.16
Recovery Algorithm with Logical Undo
Module 54
Partha Pratim
Basically same as earlier algorithm, except for changes described earlier for
Das transaction rollback
Objectives &
Outline
• (Redo phase): Scan log forward from last < checkpointL > record till end of log
Recovery with ◦ Repeat history by physically redoing all updates of all transactions,
Early Lock
Release ◦ Create an undo-list during the scan as follows
Operation Logging
Transaction Rollback ▷ undo-list is set to L initially
Failure Recovery
Recovery Algorithm
▷ Whenever < Ti start> is found Ti is added to undo-list
Plan for Backup
▷ Whenever < Ti commit> or < Ti abort> is found, Ti is deleted from undo-list
and Recovery
◦ This brings database to state as of crash, with committed as well as uncommitted
Module Summary
transactions having been redone
◦ Now undo-list contains transactions that are incomplete, that is, have neither
committed nor been fully rolled back
Module 54
Partha Pratim
Recovery from system crash (cont.)
Das
• (Undo phase): Scan log backwards, performing undo on log records of transactions
Objectives &
Outline
found in undo-list
Recovery with ◦ Log records of transactions being rolled back are processed as described earlier, as
Early Lock
Release they are found
Operation Logging
Transaction Rollback ▷ Single shared scan for all transactions being undone
Failure Recovery
Recovery Algorithm
◦ When < Ti start> is found for a transaction Ti in undo-list, write a < Ti abort>
Plan for Backup log record.
and Recovery
◦ Stop scan when < Ti start> records have been found for all Ti in undo-list
Module Summary
• This undoes the effects of incomplete transactions (those with neither commit nor
abort log records). Recovery is now complete.
Module 54
Partha Pratim
Das
Objectives &
Outline
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Plan for Backup and Recovery
Module 54
Partha Pratim
Deciding factors for having a Backup & Recovery setup.
Das
• Data Importance
Objectives &
Outline ◦ How important is the information in your database for your company? For
Recovery with business-critical data you will create a plan that involves making extra copies of
Early Lock
Release your database over the same period and ensuring that the copies can be easily
Operation Logging
Transaction Rollback
restored when required
Failure Recovery
Recovery Algorithm
• Frequency of Change
Plan for Backup ◦ How often does your database get updated? For instance, if critical data is modified
and Recovery
Module Summary
daily then you should make a daily backup schedule.
• Speed
◦ How much time do you need to back up or recover your files? Recovery speed is an
important factor that determines the maximum possible time period that could be
spent on database backup and recovery.
Source: [Link]
Module 54
Module 54
Recovery with
Early Lock
Release
Operation Logging
Transaction Rollback
Failure Recovery
Recovery Algorithm
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
RAID
Reliability via
Module 55: Backup & Recovery/5: Backup/2: RAID
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
Partha Pratim Das
RAID 2
RAID 3
Department of Computer Science and Engineering
RAID 4
RAID 5
Indian Institute of Technology, Kharagpur
RAID 6
Hybrid RAID ppd@[Link]
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
Partha Pratim • Understanding RAID: Array of redundant disks in parallel to enhance speed and
Das
reliability
Objectives &
Outline
RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
Objectives &
Outline
RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
Partha Pratim
Das
Objectives &
Outline
RAID
Reliability via
Redundancy
Mirroring
Striping
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID: Redundant Array of Independent Disks
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55 • Disk organization techniques that manage a large numbers of disks, providing a view of
Partha Pratim a single disk of
Das
◦ high capacity and high speed by using multiple disks in parallel,
Objectives &
Outline ◦ high reliability by storing data redundantly, so that data can be recovered even if a
RAID disk fails
Reliability via
Redundancy
• The chance that some disk out of a set of n disks will fail is much higher than the
Mirroring
Striping chance that a specific single disk will fail
Parity
RAID 0 ◦ For example, a system with 100 disks, each with MTTF of 100,000 hours (approx.
RAID 1
RAID 2
11 years), will have a system MTTF of 1000 hours (approx. 41 days)
RAID 3 ◦ Techniques for using redundancy to avoid data loss are critical with large numbers
RAID 4
RAID 5 of disks
RAID 6
Hybrid RAID • Originally a cost-effective alternative to large, expensive disks
RAID 01
RAID 10
◦ “I” in RAID originally stood for inexpensive
Choice of RAID
Comparison
◦ Today RAIDs are used for their higher reliability and bandwidth
Module Summary ▷ The “I” is interpreted as independent
Database Management Systems Partha Pratim Das 55.6
Improvement of Reliability via Redundancy: Mirroring
Module 55 • Redundancy: Store extra information that can be used to rebuild information lost in a
Partha Pratim disk failure
Das
• Mean time to data loss depends on mean time to failure, and mean time to repair
Objectives &
Outline ◦ For example, MTTF of 100,000 hours, mean time to repair of 10 hours gives mean
RAID time to data loss of 500*106 hours (or 57,000 years) for a mirrored pair of disks
Reliability via
Redundancy (ignoring dependent failure modes)
Mirroring
Striping • Mirroring (or shadowing)
Parity
RAID 0
◦ Duplicate every disk. Logical disk consists of two physical disks.
RAID 1
RAID 2
◦ Every write is carried out on both disks
RAID 3 ▷ Reads can take place from either disk
RAID 4
RAID 5 ◦ If one disk in a pair fails, data still available in the other
RAID 6
Hybrid RAID ▷ Data loss would occur only if a disk fails, and its mirror disk also fails before the
RAID 01
RAID 10
system is repaired
Choice of RAID − Probability of combined event is very small
Comparison
Module Summary
− Except for dependent failure modes such as fire or building collapse or
electrical power surges
Database Management Systems Partha Pratim Das 55.7
Improvement of Reliability via Redundancy (2): Striping
Module 55
Partha Pratim • Bit-level Striping: Split the bits of each byte across multiple disks
Das
◦ In an array of eight disks, write bit i of each byte to disk i
Objectives &
Outline ◦ Each access can read data at eight times the rate of a single disk
RAID ◦ But seek/access time worse than for a single disk
Reliability via
Redundancy ▷ Bit level striping is not used much any more
Mirroring
Striping • Byte-level Striping: Each file is split up into parts one byte in size. Using n = 4 disk
Parity
RAID 0 array as an example
◦ the 1st byte would be written to the 1st drive
RAID 1
RAID 2
RAID 3
RAID 4
◦ the 2nd byte to the 2nd drive and so on, until
RAID 5 ◦ the 5th byte is then written to the 1st drive again and the whole process starts over
RAID 6
Hybrid RAID
◦ the i th byte is then written to the (((i − 1) mod n) + 1)th drive
RAID 01
RAID 10
• Block-level Striping: With n disks, block i of a file goes to disk (i mod n) + 1
Choice of RAID
Comparison
◦ Requests for different blocks can run in parallel if the blocks reside on different disks
Module Summary ◦ A request for a long sequence of blocks can utilize all disks in parallel
Database Management Systems Partha Pratim Das 55.8
Improvement of Reliability via Redundancy (3): Parity
Module 55
Partha Pratim • Bit-Interleaved Parity: A single parity bit is enough for error correction, not just
Das
detection, since we know which disk has failed
Objectives &
Outline ◦ When writing data, corresponding parity bits must also be computed and written to
RAID a parity bit disk
Reliability via
Redundancy ◦ To recover data in a damaged disk, compute XOR of bits from other disks
Mirroring
Striping
(including parity bit disk)
Parity
RAID 0
• Block-Interleaved Parity: Uses block-level striping, and keeps a parity block on a
RAID 1 separate disk for corresponding blocks from n other disks
RAID 2
RAID 3 ◦ When writing data block, corresponding block of parity bits must also be computed
RAID 4
RAID 5
and written to parity disk
RAID 6
Hybrid RAID
◦ To find value of a damaged block, compute XOR of bits from corresponding blocks
RAID 01 (including parity block) from other disks
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55 • A basic set of RAID configurations that employ the techniques of striping, mirroring, or
Partha Pratim
parity to create large reliable data stores from multiple general-purpose HDDs
Das
• The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants,
Objectives &
Outline
RAID 5 (distributed parity), and RAID 6 (dual parity)
RAID • Multiple RAID levels can also be combined or nested, for instance RAID 10 (striping of
Reliability via
Redundancy mirrors) or RAID 01 (mirroring stripe sets)
Mirroring
Striping • RAID levels are standardized by the Storage Networking Industry Association (SNIA) in
Parity
RAID 0 the Common RAID Disk Drive Format (DDF) standard
RAID 1
RAID 2 • The numerical values only serve as identifiers and do not signify any metric
RAID 3
RAID 4 • While most RAID levels can provide good protection against and recovery from
RAID 5
RAID 6
hardware defects or defective sectors/read errors (hard errors), they do not provide any
Hybrid RAID protection against data loss due to catastrophic failures (fire, water) or soft errors such
RAID 01
RAID 10 as user error, software malfunction, or malware infection
Choice of RAID
Comparison • For valuable data, RAID is only one building block of a larger data loss prevention and
Module Summary recovery scheme – it cannot replace a backup plan
Source: Standard RAID levels (Accessed 24-Aug-2021)
Database Management Systems Partha Pratim Das 55.10
RAID 0: Striping
Module 55
• RAID level-0 only uses data striping, no redundant infor-
Partha Pratim
Das mation is maintained
Objectives & • If one disk fails, then all data in the disk array is lost
Outline
RAID
• Independent of the number of data disks, the effective
Reliability via
Redundancy
space utilization for a RAID Level-0 system is always 100
Mirroring percent
Striping
Parity • RAID Level-0 has the best write performance of all RAID
RAID 0
RAID 1
levels because the absence of redundant information im-
RAID 2
RAID 3
plies that no redundant information needs to be updated.
RAID 4
RAID 5
• This solution is the least costly
Image source: Standard RAID levels
RAID 6
Hybrid RAID
• Reliability is very poor (Accessed 19-Aug-2021)
RAID 01
RAID 10 Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Choice of RAID
Comparison
Module Summary
Module 55
• RAID 1 employs mirroring, maintaining two identical
Partha Pratim
Das copies of the data on two different disks
Objectives & • It is the most expensive solution
Outline
RAID
• It provides excellent fault tolerance
Reliability via
Redundancy • Every write of a disk block involves a write on both disks
Mirroring
Striping • With two copies of each block exist on different disks,
Parity
RAID 0
we can distribute reads between the two disks and allow
RAID 1 parallel reads
RAID 2
RAID 3 • RAID Level-1 does not stripe the data over different
RAID 4
RAID 5 disks. Thus the transfer rate for a single request is com-
RAID 6
Hybrid RAID
parable to the transfer rate of a single disk
Image source: Standard RAID levels
RAID 01
RAID 10
• The effective space utilization is 50 percent, independent (Accessed 19-Aug-2021)
Module Summary Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Module 55
• RAID 2 uses designated drive for parity
Partha Pratim
Das • In RAID 2, the striping unit is a single bit
Objectives &
Outline
• Hamming Code is used for parity
RAID
◦ Hamming codes can detect up to two-bit er-
Reliability via
rors or correct one-bit errors
Redundancy
◦ For a 4-bit data, 3 bits are added
Mirroring
Striping
◦ Simple parity code cannot correct errors, and
Parity can detect only an odd number of bits in error Image source: Standard RAID levels
RAID 0 (Accessed 19-Aug-2021)
RAID 1
RAID 2
• In a disk array with D data disks, the smallest unit of transfer for a read is a set of D
RAID 3 blocks. It is so because each bit of the data is stored in different blocks of D disks
RAID 4
RAID 5
subsequently (Bit-level striping)
RAID 6
Hybrid RAID • Writing a block involves reading D blocks into main memory, modifying D + C blocks,
RAID 01
RAID 10
and writing D + C blocks to disk, where C is the number of check disks. This sequence
Choice of RAID of steps is called a read-modify-write cycle
Comparison
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Module Summary
Module 55
Partha Pratim
• RAID 3 has a single check disk with parity
Das information. Thus, the reliability overhead
Objectives & for RAID 3 is a single disk, the lowest over-
Outline
head possible
RAID
Reliability via
Redundancy
• RAID 3 consists of byte-level striping with
Mirroring dedicated parity. Therefore the data trans-
Striping
Parity fer rate of this level is high because data
RAID 0
RAID 1
can be accessed in parallel Image source: Standard RAID levels
(Accessed 19-Aug-2021)
RAID 2
RAID 3 • RAID-3 cannot service multiple requests simultaneously: This is so because any single
RAID 4
RAID 5
block of data will be spread across all members of the set and will reside in the same
RAID 6 physical location on each disk and thus every single I/O request has to be addressed by
Hybrid RAID
RAID 01 working on every disk in the array
RAID 10
Choice of RAID
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Comparison
Module Summary
Module 55
Partha Pratim
Das
• RAID 4 has a striping unit of a disk block
instead of a single bit, as in RAID 3
Objectives &
Outline
• Read requests of the size of a disk block
RAID
Reliability via
can be served entirely by the disk where
Redundancy
Mirroring
the requested block resides therefore RAID
Striping 4 provides good performance for data reads
Parity Image source: Standard RAID levels
RAID 0 (Accessed 19-Aug-2021)
RAID 1
RAID 2
• Provides recovery of corrupted or lost data using XOR recovery mechanism
RAID 3
RAID 4 • If a disk experiences a failure, recovery can be made by simply XORing all the
RAID 5
RAID 6
remaining data bits and the parity bit
Hybrid RAID
RAID 01
• Facilitates recovery of at most 1 disk failure. At this level, if more than one disk fails,
RAID 10 then there is no way to recover the data
Choice of RAID
Comparison • Write performance is low due to the need to write all parity data to a single disk
Module Summary
Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Database Management Systems Partha Pratim Das 55.15
RAID 5: Distributed Parity
Module 55
Partha Pratim
Das
• RAID 5 improves upon RAID 4 by dis-
tributing the parity blocks uniformly over
Objectives &
Outline all disks instead of storing them on a sin-
RAID gle check disk
Reliability via
Redundancy
Mirroring
• Several write requests can potentially be
Striping processed in parallel since the bottleneck
Parity
RAID 0
of a unique check disk has been eliminated
Image source: Standard RAID levels
RAID 1 (Accessed 19-Aug-2021)
RAID 2
RAID 3 • Read requests have a higher level of parallelism. Since the data is distributed over all
RAID 4
RAID 5
disks, read requests involve all disks, whereas, in systems with a dedicated check disk,
RAID 6 the check disk never participates in reads
Hybrid RAID
RAID 01 • This level too allows recovery of only 1 disk failure like level 4
RAID 10
Choice of RAID
Comparison Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke
Module Summary
Module 55
• RAID 6 extends RAID 5 by adding an-
Partha Pratim
Das other parity block, thus it uses block-level
Objectives &
striping with two parity blocks distributed
Outline across all member disks
RAID
Reliability via
• Write performance of RAID 6 is poorer
Redundancy
Mirroring
than RAID 5 because of the increased com-
Striping plexity of parity calculation
Parity
RAID 0
• RAID 6 use Reed-Solomon Codes to re-
RAID 1
RAID 2 cover from up to two simultaneous disk Image source: Standard RAID levels
(Accessed 19-Aug-2021)
RAID 3
RAID 4
failures. Therefore it can handle a disk fail-
RAID 5 ure during recovery of a failed disk
RAID 6
Hybrid RAID Source: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke, Standard RAID levels
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
• Nested RAID levels (Hybrid RAID), combine two or more of the standard RAID levels
Partha Pratim
Das to gain performance, additional redundancy or both, as a result of combining properties
Objectives &
of different standard RAID layouts.
Outline
• Nested RAID levels are usually numbered using a series of numbers
RAID
Reliability via
Redundancy
◦ The first number in the numeric designation denotes the lowest RAID level in the
Mirroring ”stack”, while
Striping
Parity
◦ the rightmost one denotes the highest layered RAID level
RAID 0
RAID 1 • For example, RAID 50 layers the data striping of RAID 0 on top of the distributed
RAID 2
RAID 3
parity of RAID 5
RAID 4
RAID 5
• Nested RAID levels include RAID 01, RAID 10, RAID 100, RAID 50 and RAID 60,
RAID 6 which all combine data striping with other RAID techniques
Hybrid RAID
RAID 01 • As a result of the layering scheme, RAID 01 and RAID 10 represent significantly
RAID 10
Choice of RAID different nested RAID levels
Comparison
Module 55
• RAID 01 is a mirror of stripes
Partha Pratim
Das • It achieves both replication and sharing of data between
Objectives & disks
Outline
RAID
• The usable capacity of a RAID 01 array is the same as
Reliability via
Redundancy
in a RAID 1 array made of the same drives, in which
Mirroring one half of the drives is used to mirror the other half:
Striping
Parity
(N/2) · Smin , where N is the total number of drives and
RAID 0
Smin is the capacity of the smallest drive in the array
RAID 1 Image source: Nested RAID levels
RAID 2
RAID 3
• At least four disks are required in a standard RAID 01 (Accessed 23-Aug-2021)
Module Summary
Module 55
• RAID 10 is a stripe of mirrors
Partha Pratim
Das • RAID 10 is a RAID 0 array of mirrors, which may be
Objectives & two- or three-way mirrors, and requires a minimum of
Outline
four drives
RAID
Reliability via
Redundancy
• RAID 10 provides better throughput and latency than all
Mirroring other RAID levels except RAID 0 (which wins in through-
Striping
Parity
put)
RAID 0
RAID 1
• Thus, it is the preferable RAID level for I/O-intensive
RAID 2
RAID 3
applications such as database, email, and web servers, as Image source: Nested RAID levels
(Accessed 23-Aug-2021)
RAID 4 well as for any other use requiring high disk performance
RAID 5
RAID 6 Source: Nested RAID levels (Accessed 23-Aug-2021)
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Module 55
• Different RAID Levels have different speed and fault tol-
Partha Pratim
Das erance properties
Objectives & • RAID level 0 is not fault tolerant
Outline
RAID
• Levels 1, 1E, 5, 50, 6, 60, and 1+0 are fault tolerant to
Reliability via
Redundancy
a different degree - should one of the hard drives in the
Mirroring array fail, the data is still reconstructed on the fly and
Striping
Parity
no access interruption occurs
RAID 0
RAID 1
• RAID levels 2, 3, and 4 are theoretically defined but not
RAID 2
RAID 3
used in practice
RAID 4
RAID 5
• There are some more complex layouts like RAID 5E/5EE Image source: RAID Calculator
RAID 6 (integrating some spare space) and RAID DP (Accessed 23-Aug-2021)
Hybrid RAID
RAID 01 ◦ “E” often stands for “Enhanced” or “Extended”
RAID 10
Choice of RAID
◦ Some of them use hot spare drives
Comparison
Module Summary
Module 55
Module 55
Partha Pratim • Level 1 provides much better write performance than level 5
Das
◦ Level 5 requires at least 2 block reads and 2 block writes to write a single block,
Objectives &
Outline whereas Level 1 only requires 2 block writes
RAID ◦ Level 1 preferred for high update environments such as log disks
Reliability via
Redundancy • Level 1 had higher storage cost than level 5
Mirroring
Striping ◦ disk drive capacities increasing rapidly (50%/year) whereas disk access times have
Parity
RAID 0 decreased much less (x 3 in 10 years)
RAID 1
RAID 2
◦ I/O requirements have increased greatly, e.g. for Web servers
RAID 3 ◦ When enough disks have been bought to satisfy required rate of I/O, they often
RAID 4
RAID 5
have spare storage capacity
RAID 6
Hybrid RAID
▷ so there is often no extra monetary cost for Level 1!
RAID 01
RAID 10
• Level 5 is preferred for applications with low update rate, and large amounts of data
Choice of RAID
Comparison
• Level 1 is preferred for all other applications
Module Summary
Module Summary
Module 55 • RAID does not equate to 100% uptime: Nothing can. RAID is another tool on in
Partha Pratim the toolbox meant to help minimize downtime and availability issues. There is still a
Das
risk of a RAID card failure, though that is significantly lower than a HDD failure
Objectives &
Outline • RAID does not replace backups: Nothing can replace a well planned and frequently
RAID tested backup implementation!
Reliability via
Redundancy
Mirroring
• RAID does not protect against data corruption, human error, or security issues:
Striping While it can protect you against a drive failure, there are innumerable reasons for
Parity
RAID 0
keeping backups. So RAID is not a replacement for backups
RAID 1
RAID 2 • RAID does not necessarily allow to dynamically increase the size of the array: If
RAID 3
RAID 4
you need more disk space, you cannot simply add another drive to the array. You are
RAID 5 likely going to have to start from scratch, rebuilding/reformatting the array. Luckily,
RAID 6
Hybrid RAID
Steadfast engineers are here to help you architect and execute whatever systems you
RAID 01
need to keep your business running.
RAID 10
Choice of RAID
Comparison
• RAID isn’t always the best option for virtualization and high-availability failover:
Module Summary
You will want to look at SAN solutions
Source: (Almost) Everything You Need to Know About RAID
Database Management Systems Partha Pratim Das 55.26
Module Summary
Module 55
Partha Pratim • Understood RAID - array of redundant disks in parallel to enhance speed and reliability
Das
Objectives &
Outline
RAID
Reliability via
Slides used in this presentation are borrowed from [Link] with kind
Redundancy
Mirroring
permission of the authors.
Striping Edited and new slides are marked with “PPD”.
Parity
RAID 0
RAID 1
RAID 2
RAID 3
RAID 4
RAID 5
RAID 6
Hybrid RAID
RAID 01
RAID 10
Choice of RAID
Comparison
Module Summary
Partha Pratim
Das
Week Recap
Objectives &
Database Management Systems
Outline
Module 56: Query Processing and Optimization/1: Processing
Query Processing
Query Cost
Selection
Operation
Complex Selections Partha Pratim Das
Sorting
External Sort-Merge
Department of Computer Science and Engineering
Join Operation Indian Institute of Technology, Kharagpur
Other Operations
ppd@[Link]
Module Summary
Module 56
Partha Pratim • Learnt the importance of backup an analysed different backup strategies
Das
• Failures may be due to variety of sources – each needs a strategy for handling
Week Recap
Objectives &
• A proper mix and management of volatile, non-volatile and stable storage can
Outline
guarantee recovery from failures and ensure Atomicity, Consistency and Durability
Query Processing
Query Cost
• Log-based recovery is efficient and effective
Selection • Learnt how Hot backup of transaction log helps in recovering consistent database.
Operation
Complex Selections
• Studied the recovery algorithms for concurrent transactions
Sorting
External Sort-Merge • Recovery based on operation logging supplements log-based recovery
Join Operation
• Planning for Backup
Other Operations
Module Summary
• Understood RAID - array of redundant disks in parallel to enhance speed and reliability
Module 56
Objectives &
• To understand the algorithms for processing Selection Operations, Sorting, Join
Outline
Operations, and a few Other Operations
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
Objectives &
• Selection Operation
Outline
Query Processing
• Sorting
Query Cost • Join Operation
Selection
Operation • Other Operations
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Overview of Query Processing
Other Operations
Module Summary
Module 56
a) Parsing and translation
Partha Pratim
Das b) Optimization
Week Recap c) Evaluation
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
• Parsing and translation
Partha Pratim
Das ◦ translate the query into its internal form
Week Recap . This is then translated into relational algebra
Objectives &
Outline
◦ Parser checks syntax, verifies relations
Query Processing • Evaluation
Query Cost
◦ The query-execution engine takes a query-evaluation plan, executes that plan, and
Selection
Operation returns the answers to the query
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
• Consider the query
Partha Pratim
Das select salary
Week Recap
from instructor
Objectives & where salary < 75000;
Outline
Query Processing
which can be translated into either of the following relational-algebra expressions:
Query Cost ◦ σsalary <75000 (Πsalary (instructor ))
Selection ◦ Πsalary (σsalary <75000 (instructor ))
Operation
Complex Selections • Each relational algebra operation can be evaluated using one of several different
Sorting algorithms
External Sort-Merge
Module 56
Partha Pratim • Query Optimization: Amongst all equivalent evaluation plans choose the one with
Das
lowest cost
Week Recap
◦ Cost is estimated using statistical information from the database catalog
Objectives &
Outline . For example, number of tuples in each relation, size of tuples, etc.
Query Processing
• In this module we study
Query Cost
Other Operations
• In the next module
Module Summary ◦ We study how to optimize queries, that is, how to find an evaluation plan with
lowest estimated cost
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Measures of Query Cost
Other Operations
Module Summary
Module 56
Partha Pratim • Cost is generally measured as total elapsed time for answering query
Das
◦ Many factors contribute to time cost
Week Recap
Objectives &
. disk accesses, CPU, or even network communication
Outline
• Typically disk access is the predominant cost, and is also relatively easy to estimate
Query Processing
Join Operation
. Cost to write a block is greater than cost to read a block
Other Operations − data is read back after being written to ensure that the write was successful
Module Summary
Module 56
Partha Pratim • For simplicity we just use the number of block transfers from disk and the number
Das
of seeks as the cost measures
Week Recap
◦ tT : time to transfer one block
Objectives &
Outline ◦ tS : time for one seek
Query Processing ◦ Cost for b block transfers plus S seeks
Query Cost
Selection b ∗ tT + S ∗ tS
Operation
Complex Selections
Other Operations • We do not include cost to writing output to disk in our cost formulae
Module Summary
Module 56
Partha Pratim • Several algorithms can reduce disk IO by using extra buffer space
Das
◦ Amount of real memory available to buffer depends on other concurrent queries and
Week Recap
OS processes, known only during execution
Objectives &
Outline . We often use worst case estimates, assuming only the minimum amount of
Query Processing memory needed for the operation is available
Query Cost
Selection
• Required data may be buffer resident already, avoiding disk I/O
Operation
Complex Selections
◦ But hard to take into account for cost estimation
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Selection Operation
Other Operations
Module Summary
Module 56
Partha Pratim
A# Algorithm Cost Reason
Das A1 Linear Search tS + br × tT One initial seek plus br block transfers
A1 Linear Search, Average case Since at most one record satisfies condition, scan can be terminated as
Week Recap
Eq. on Key tS + (br /2) × tT soon as the required record is found. br blocks transfers in worst case
Objectives & A2 Prm. Index, (hi +1)×(tT +tS ) Index lookup traverses the height of the tree plus one I/O to fetch the
Outline
Eq. on Key record; each of these I/O operations requires a seek and a block transfer
Query Processing A3 Prm. Index, hi × (tT + tS )+ One seek for each level of the tree, one seek for the first block. Here
Query Cost
Eq. on Nonkey b × tT all of b are read. These blocks are leaf blocks assumed to be stored
sequentially (for a primary index) and don’t require additional seeks
Selection
Operation
A4 Snd. Index, (hi +1)×(tT +tS ) This case is similar to primary index
Complex Selections
Eq. on Key
A4 Snd. Index, (hi +n)×(tT +tS ) Here, cost of index traversal is the same as for A3, but each record may
Sorting
Eq. on Nonkey be on a different block, requiring a seek per record. Cost is potentially
External Sort-Merge
very high if n is large
Join Operation A5 Prm. Index, hi × (tT + tS )+ Identical to the case of A3, equality on nonkey
Other Operations Comparison b × tT
A6 Snd. Index, (hi +n)×(tT +tS ) Identical to the case of A4, equality on nonkey
Module Summary
Comparison
tT is time to transfer one block. tS is time for one seek
br denotes the number of blocks in the file
b denotes the number of blocks containing records with the specified search key
hi denotes the height of the index. n is the number of records fetched
Database Management Systems Partha Pratim Das 56.15
Complex Selections: Conjunction
Module 56
• Conjunction: σθ1 ∧θ2 ∧...θn (r)
Partha Pratim
Das • A7 (conjunctive selection using one index)
Week Recap ◦ Select a combination of θi and algorithms A1 through A6 that results in the least
Objectives &
Outline
cost for σθi (r)
Query Processing
◦ Test other conditions on tuple after fetching it into memory buffer
Query Cost • A8 (conjunctive selection using composite index)
Selection
Operation ◦ Use appropriate composite (multiple-key) index if available
Complex Selections
• A9 (conjunctive selection by intersection of identifiers)
Sorting
External Sort-Merge ◦ Requires indices with record pointers
Join Operation ◦ Use corresponding index for each condition, and take intersection of all the obtained
Other Operations
sets of record pointers
Module Summary
◦ Then fetch records from file
◦ If some conditions do not have appropriate indices, apply test in memory
Module 56
Objectives &
◦ Applicable if all conditions have available indices
Outline
. Otherwise use linear scan
Query Processing
Query Cost
◦ Use corresponding index for each condition, and take union of all the obtained sets
Selection of record pointers
Operation
Complex Selections
◦ Then fetch records from file
Sorting • Negation: σ¬θ (r)
External Sort-Merge
Join Operation
◦ Use linear scan on file
Other Operations
◦ If very few records satisfy ¬θ, and an index is applicable to θ
Module Summary . Find satisfying records using index and fetch from file
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Sorting
Other Operations
Module Summary
Module 56
• We may build an index on the relation, and then use the index to read the relation in
Partha Pratim
Das sorted order
Week Recap
◦ May lead to one disk block access for each tuple
Objectives &
Outline
• For relations that fit in memory, techniques like quicksort can be used
Query Processing • For relations that do not fit in memory, external sort-merge is a good choice
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56 a) Create sorted runs. Let M denote the number of blocks in the main-memory buffer available for sorting. First, a
number of sorted runs are created; each run is sorted, but contains only some of the records of the relation.
Partha Pratim
Das i = 0;
repeat
Week Recap read M blocks of the relation, or the rest of the relation, whichever is smaller;
sort the in-memory part of the relation;
Objectives &
Outline write the sorted data to run file Ri;
i = i + 1;
Query Processing
until the end of the relation
Query Cost
b) Merge the runs (N-way merge): Now, the runs are merged. For the total number of runs, N < M, so that we can
Selection allocate one block to each run and have space left to hold one block of output. The merge stage operates as follows:
Operation
Complex Selections
read one block of each of the N files Ri into a buffer block in memory;
repeat
Sorting
choose the first tuple (in sort order) among all buffer blocks;
External Sort-Merge
write the tuple to the output, and delete it from the buffer block;
Join Operation if the buffer block of any run Ri is empty and not end-of-file(Ri)
Other Operations then read the next block of Ri into the buffer block;
until all input buffer blocks are empty
Module Summary
c) If N ≥ M, several merge passes are required
• In each pass, contiguous groups of M−1 runs are merged.
• A pass reduces the number of runs by a factor of M−1, and creates runs longer by the same factor
◦ For M=11 and 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs
• Repeated passes are performed till all runs have been merged into one
Database Management Systems Partha Pratim Das 56.21
Join Operation PPD
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Join Operation
Other Operations
Module Summary
Module 56
Sorting
• Examples use the following information
External Sort-Merge
◦ Number of records of student: nstudents = 5,000
Join Operation
◦ Number of records of takes: ntakes = 10,000
Other Operations
◦ Number of blocks of student: bstudents = 100
Module Summary
◦ Number of blocks of takes: btakes = 400
Module 56
• To compute the theta join r ./θ s
Partha Pratim
Das for each tuple tr in r do begin
Week Recap
for each tuple ts in s do begin
Objectives &
test pair (tr , ts ) to see if they satisfy the join condition θ
Outline if they do, add tr • ts to the result.
Query Processing
end
Query Cost
end
Selection
Operation
• r is called the outer relation and s the inner relation of the join
Complex Selections
Sorting • Requires no indices and can be used with any kind of join condition
External Sort-Merge
Join Operation
• Expensive since it examines every pair of tuples in the two relations
Other Operations
Module Summary
Module 56 • In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is
Partha Pratim nr ∗ bs + br block transfers, plus nr + br seeks, where nr (ns ) denotes the number of tuples in r (s) and
Das br (bs ) denotes the number of blocks containing tuples of in r (s)
Week Recap • If the smaller relation fits entirely in memory, use that as the inner relation.
Objectives &
Outline ◦ Reduces cost to br + bs block transfers and 2 seeks
Query Processing • Example of join of students and takes: nstudents = 5,000, ntakes = 10,000, bstudents = 100, btakes = 400
Query Cost
• Assuming worst case memory availability cost estimate is
Selection
Operation ◦ with student as outer relation:
Complex Selections
. 5000 * 400 + 100 = 2,000,100 block transfers,
Sorting . 5000 + 100 = 5100 seeks
External Sort-Merge
Join Operation
◦ with takes as the outer relation
Other Operations
. 10000 * 100 + 400 = 1,000,400 block transfers and 10,400 seeks
Module Summary • If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block transfers
• Block nested-loops algorithm is preferable
Module 56
• Variant of nested-loop join in which every block of inner relation is paired with every
Partha Pratim
Das block of outer relation
Week Recap
for each block Br of r do begin
Objectives &
for each block Bs of s do begin
Outline for each tuple tr in Br do begin
Query Processing
for each tuple ts in Bs do begin
Query Cost
Check if (tr , ts ) satisfy the join condition
if they do, add tr • ts to the result.
Selection
Operation
Complex Selections
end
Sorting
External Sort-Merge
end
Join Operation end
Other Operations end
Module Summary
Other Operations
Module Summary
Other Operations
• If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer
relation.
Module Summary
Module 56
• Compute student o
n takes, with student as the outer relation.
Partha Pratim
Das • Let takes have a primary B + -tree index on the attribute ID, which contains 20 entries
Week Recap in each index node.
Objectives &
Outline
• Since takes has 10,000 tuples, the height of the tree is 4, and one more access is
Query Processing
needed to find the actual data
Query Cost • student has 5000 tuples
Selection
Operation • Cost of block nested loops join
Complex Selections ◦ 400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks
Sorting . assuming worst case memory
External Sort-Merge . may be significantly less with more memory
Join Operation
Other Operations
• Cost of indexed nested loops join
◦ 100 + 5000 * 5 = 25,100 block transfers and seeks.
Module Summary ◦ CPU cost likely to be less than that for block nested loops join
Module 56
Partha Pratim
Das
Week Recap
Objectives &
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Other Operations
Module Summary
Module 56
• Duplicate Elimination
Partha Pratim
Das • Projection
Week Recap • Aggregation
Objectives &
Outline • Set Operations
Query Processing
• Outer Join
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
• Duplicate Elimination can be implemented via hashing or sorting
Partha Pratim
Das ◦ On sorting duplicates will come adjacent to each other, and all but one set of
Week Recap
duplicates can be deleted
Objectives & ◦ Optimization: duplicates can be deleted during run generation as well as at
Outline
intermediate merge steps in external sort-merge
Query Processing
◦ Hashing is similar – duplicates will come into the same bucket
Query Cost
Selection • Projection :
Operation
Complex Selections ◦ perform projection on each tuple
Sorting ◦ followed by duplicate elimination
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
• Aggregation can be implemented in a manner similar to duplicate elimination
Partha Pratim
Das ◦ Sorting or hashing can be used to bring tuples in the same group together, and then
Week Recap
the aggregate functions can be applied on each group
Objectives & ◦ Optimization: combine tuples in the same group during run generation and
Outline
intermediate merges, by computing partial aggregate values
Query Processing
Query Cost
. For count, min, max, sum: keep aggregate values on tuples found so far in the
Selection group
Operation
Complex Selections
− When combining partial aggregate for count, add up the aggregates
Sorting . For avg, keep sum and count, and divide sum by count at the end
External Sort-Merge
Join Operation
Other Operations
Module Summary
Module 56
• Understood the overall flow for Query Processing and defined the Measures of Query
Partha Pratim
Das Cost
Week Recap • Studied the algorithms for processing Selection Operations, Sorting, Join Operations
Objectives & and a few Other Operations
Outline
Query Processing
Query Cost
Selection
Operation
Complex Selections
Sorting
External Sort-Merge
Join Operation
Other Operations
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Query
Optimization Module 57: Query Processing and Optimization/2: Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Partha Pratim Das
Expressions
Equivalence Rules
Example
Department of Computer Science and Engineering
Plan Generation
Indian Institute of Technology, Kharagpur
Module Summary
ppd@[Link]
Module 57
Partha Pratim • Understood the overall flow for Query Processing and defined the Measures of Query
Das
Cost
Objectives &
Outline • Studied the algorithms for processing Selection Operations, Sorting, Join Operations
Query and a few Other Operations
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim
Das
Objectives &
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Introduction to Query Optimization
Module Summary
Module 57
• Alternative ways of evaluating a given query
Partha Pratim
Das ◦ Equivalent expressions
Objectives &
◦ Different algorithms for each operation
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim • An evaluation plan defines exactly what algorithm is used for each operation, and how
Das
the execution of the operations is coordinated
Objectives &
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim • Cost difference between evaluation plans for a query can be enormous
Das
◦ For example, seconds vs. days in some cases
Objectives &
Outline • Steps in cost-based query optimization
Query
Optimization a) Generate logically equivalent expressions using equivalence rules
Equivalent
Expressions
b) Annotate resultant expressions to get alternative query plans
Evaluation Plan
Cost
c) Choose the cheapest plan based on estimated cost
Transformation of • Estimation of plan cost based on:
Relational
Expressions ◦ Statistical information about relations.
Equivalence Rules
Example . Examples: number of tuples, number of distinct values for an attribute
Plan Generation
Module Summary
◦ Statistics estimation for intermediate results
. to compute cost of complex expressions
◦ Cost formulae for algorithms, computed using statistics
Module 57
Partha Pratim
Das
Objectives &
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Transformation of Relational Expressions
Module Summary
Module 57
Partha Pratim • Two relational algebra expressions are said to be equivalent if the two expressions
Das
generate the same set of tuples on every legal database instance
Objectives &
Outline ◦ Note: order of tuples is irrelevant
Query ◦ We do not care if they generate different results on databases that violate integrity
Optimization
Equivalent
constraints
Expressions
Evaluation Plan • In SQL, inputs and outputs are multisets of tuples
Cost
Transformation of
◦ Two expressions in the multiset version of the relational algebra are said to be
Relational
Expressions
equivalent if the two expressions generate the same multiset of tuples on every legal
Equivalence Rules database instance.
Example
Plan Generation • An equivalence rule says that expressions of two forms are equivalent
Module Summary
◦ Can replace expression of first form by second, or vice versa
Module 57
Partha Pratim 1 Conjunctive selection operations can be deconstructed into a sequence of individual
Das
selections
Objectives &
Outline
σθ1 ∧θ2 (E ) = σθ1 (σθ2 (E ))
Query
Optimization
2 Selection operations are commutative
Equivalent
Expressions
Evaluation Plan σθ1 (σθ2 (E )) = σθ2 (σθ1 (E ))
Cost
Transformation of
Relational
3 Only the last in a sequence of projection operations is needed, the others can be omitted
Expressions
Equivalence Rules
Example πL1 (πL2 (. . . (πLn (E )))) = πL1 (E )
Plan Generation
Module Summary
4 Selections can be combined with Cartesian products and theta joins
σθ (E1 XE2 ) = E1 o
nθ E2
σθ1 (E1 o
n θ 2 E2 ) = E1 o
nθ1 ∧θ2 E2
Database Management Systems Partha Pratim Das 57.11
Equivalence Rules (2)
Module 57
Objectives & E1 o
n θ E2 = E2 o
n θ E1
Outline
Query
Optimization 6 a. Natural join operations are associative:
Equivalent
Expressions
Evaluation Plan (E1 o
n E2 ) o
n E3 = E1 o
n (E2 o
n E3 )
Cost
Transformation of
Relational b. Theta joins are associative in the following manner:
Expressions
Equivalence Rules
Example (E1 o
n θ 1 E2 ) o
nθ2 ∧θ3 E3 = E1 o
nθ1 ∧θ3 (E2 o
n θ 2 E3 )
Plan Generation
Module Summary
where θ2 involves attributes from E2 and E3 only
Module 57
Partha Pratim
Das
Objectives &
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim 7 The selection operation distributes over the theta join operation under the following
Das
two conditions:
Objectives &
Outline a. When all the attributes in θ0 involve only the attributes of one of the expressions
Query (E1 ) being joined
Optimization
Equivalent
σθ0 (E1 o
nθ E2 ) = (σθ0 (E1 )) o
n θ E2
Expressions
Evaluation Plan
Cost
b. When θ1 involves only the attributes of E1 and θ2 involves only the attributes of E2 .
Transformation of
Relational
Expressions
σθ1 ∧θ2 (E1 o
nθ E2 ) = (σθ1 (E1 )) o
nθ (σθ2 (E2 ))
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim 8 The projection operation distributes over the theta join operation as follows:
Das
Objectives &
a. if θ involves
Q only attributes from Q L1 ∪ L2 :Q
Outline L1 ∪L2 (E 1 on θ E 2 ) = L1 (E1 ) o
nθ L2 (E2 )
Query
Optimization
b. Consider a join E1 o n θ E2
Equivalent
Expressions
• Let L1 and L2 be sets of attributes from E1 and E2 , respectively
Evaluation Plan • Let L3 be attributes of E1 that are involved in join condition θ, but are not in
Cost
Transformation of
L1 ∪ L2 , and
Relational
Expressions
• Let L4 be attributes of E2 that are involved in join condition θ, but are not in
Equivalence Rules L1 ∪ L2Q . Q Q Q
Example
Plan Generation L1 ∪L2 (E1 onθ E2 ) = L1 ∪L2 ( L1 ∪L3 (E1 )) o
nθ ( L2 ∪L4 (E2 ))
Module Summary
Module 57
9 The set operations union and intersection are commutative.
Partha Pratim
Das E1 ∪ E2 = E2 ∪ E1
Objectives &
E1 ∩ E2 = E2 ∩ E1
Outline • (set difference is not commutative).
Query
Optimization 10 Set union and intersection are associative.
Equivalent
Expressions • (E 1 ∪ E 2) ∪ E 3 = E 1 ∪ (E 2 ∪ E 3)
Evaluation Plan
Cost • (E 1 ∩ E 2) ∩ E 3 = E 1 ∩ (E 2 ∩ E 3)
Transformation of
Relational
11 The selection operation distributes over ∪, ∩, −
Expressions
Equivalence Rules
σθ (E1 − E2 ) = σθ (E1 ) − σθ (E2 )
Example and similarly for ∪ and ∩ in place of −
Plan Generation
Module 57
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim • Query: Find the names of all instructors in the Music department, along with the titles
Das
of the courses that they teach
Objectives &
Outline ◦ πname,title (σdept name=”Music” (instructor o
n (teaches o
n πcourse id,title (course))))
Query
Optimization
• Transformation using rule 7a
Equivalent
Expressions
◦ πname,title ((σdept name=”Music” (instructor )) n
o (teaches n
o πcourse id,title (course)))
Evaluation Plan
Cost
• Performing the selection as early as possible reduces the size of the relation to be joined
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
• Query: Find the names of all instructors in the Music department who have taught a
Partha Pratim
Das course in 2009, along with the titles of the courses that they taught
Objectives & ◦ πname,title (σdept name=”Music”∧year =2009 (instructor n
o (teaches n
o πcourse id,title (course))))
Outline
Query
• Transformation using join associatively (Rule 6a):
Optimization
◦ πname,title (σdept name=”Music”∧year =2009 ((instructor n
o teaches) n
o πcourse id,title (course)))
Equivalent
Expressions
Evaluation Plan
• Second form provides an opportunity to apply the “perform selections early” rule,
Cost
resulting in the subexpression
Transformation of
Relational ◦ σdept name=”Music” (instructor ) o
n σyear =2009 (teaches)
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Partha Pratim
Das
Objectives &
Outline
Query
Optimization
Equivalent
Expressions
Evaluation Plan
Cost
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
• Consider:
Partha Pratim
Das πname,title ((σdept name=”Music” (instructor )) n (teaches o
o n πcourse id,title (course)))
Objectives & • When we compute
Outline
Query
σdept name=”Music” (instructor o
n teaches)
Optimization
Equivalent
we obtain a relation whose schema is:
Expressions
Evaluation Plan
(ID, name, dept name, salary , course id, sec id, semester , year )
Cost
• Push projections using equivalence rules 8a and 8b; eliminate unneeded attributes from
Transformation of
Relational intermediate results to get:
Expressions
Equivalence Rules πname,title (πname,course id (σdept name=”Music” (instructor ) o
n teaches)) o
n
Example
Plan Generation πcourse id,title (course)
Module Summary
• Performing the projection as early as possible reduces the size of the relation to be
joined
Q Q Q
L ∪L2 (E1 noθ E2 ) = L (E1 ) n
oθ L2 (E2 )
Q 1 Q 1 Q Q
L1 ∪L2 (E1 n
oθ E2 ) = L ∪L
1 2
( L ∪L (E1 )) n
1 3
oθ ( L ∪L (E2 ))
2 4
Module 57
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Module 57
Transformation of
department
Relational
Expressions
◦ it is better to compute
Equivalence Rules σdept name=”Music” (instructor ) o
n (teaches)
Example
Plan Generation first
Module Summary
Module 57
Partha Pratim • Query optimizers use equivalence rules to systematically generate expressions
Das
equivalent to the given expression
Objectives &
Outline • Can generate all equivalent expressions as follows:
Query
Optimization
◦ Repeat
Equivalent
Expressions
. apply all applicable equivalence rules on every subexpression of every equivalent
Evaluation Plan
Cost
expression found so far
Transformation of
. add newly generated expressions to the set of equivalent expressions
Relational
Expressions Until no new equivalent expressions are generated above
Equivalence Rules
Example • The above approach is very expensive in space and time
Plan Generation
◦ Two approaches
Module Summary
. Optimized plan generation based on transformation rules
. Special case approach for queries with only selections, projections and joins
Module 57
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module 57
Transformation of
Relational
Expressions
Equivalence Rules
Example
Plan Generation
Module Summary
Slides used in this presentation are borrowed from [Link] with kind
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 57.26
Module 58
Partha Pratim
Das
Objectives &
Outline Database Management Systems
Performance and
Scalability Module 58: RDBMS Performance & Architecture
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Partha Pratim Das
Parallel Systems
Speedup & Scaleup
Department of Computer Science and Engineering
Interconnect
Indian Institute of Technology, Kharagpur
Distributed Systems
Scaling ppd@[Link]
Databases
Scaling out
Databases
Module Summary
Module 58
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim • To evaluate RDBMS, especially with reference to performance and scalability, as a
Das
backbone for data-intensive application development
Objectives &
Outline • To understand the role of system and database architecture in performance
Performance and
Scalability
• To understand options for Scaling Databases
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
RDBMS Performance and Scalability
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
RDBMS Architecture
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Performance and
interface facilities
Scalability • The interface between the front-end and the back-end is through SQL or through an
Performance Factors
& Issues application program interface
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58 • Transaction or Query servers which are widely used in relational database systems
Partha Pratim ◦ A typical transaction cycle is:
Das
. Clients send requests to the server
Objectives & . Transactions are executed at the server
Outline
. Results are shipped back to the client
Performance and
Scalability ◦ Requests are specified in SQL, and communicated to the server through a remote
Performance Factors
& Issues procedure call (RPC) mechanism
Architecture ◦ Transactional RPC allows many RPC calls to form a transaction.
Centralized &
Client-Server ◦ ODBC / JDBC used to connect
Server Systems
Parallel Systems • Data servers, used in object-oriented database systems
Speedup & Scaleup
Interconnect
◦ Used in high-speed LANs, in cases where
Distributed Systems . The clients are comparable in processing power to the server
Scaling . The tasks to be executed are compute intensive
Databases
Scaling out ◦ Issues:
Databases
. Page-Shipping versus Item-Shipping
Module Summary
. Locking
. Data Caching
. Lock Caching
Database Management Systems Partha Pratim Das 58.13
RDBMS Architecture: Server Systems
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim • Parallel database systems consist of multiple processors and multiple disks connected by
Das
a fast interconnection network
Objectives &
Outline • A coarse-grain parallel machine consists of a small number of powerful processors
Performance and
Scalability
• A massively parallel or fine grain parallel machine utilizes thousands of smaller
Performance Factors
& Issues
processors
Architecture • Two main performance measures:
Centralized &
Client-Server
Server Systems
◦ throughput: the number of tasks that can be completed in a given time interval
Parallel Systems ◦ response time the amount of time it takes to complete a single task from the time
Speedup & Scaleup
Interconnect
it is submitted
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim • Speedup: a fixed-sized problem executing on a small system is given to a system which
Das
is N-times larger
Objectives &
Outline ◦ Measured by:
Performance and
Scalability
. Speedup = small system elapsed time
large system elapsed time
Performance Factors
& Issues
◦ Speedup is linear if equation equals N
Architecture ◦ Speedup Percentage = Speedup N ∗ 100%
Centralized &
Client-Server • Scaleup: increase the size of both the problem and the system N-times larger system
Server Systems
Parallel Systems
used to perform N-times larger job
Speedup & Scaleup
Interconnect
◦ Measured by:
Distributed Systems
. Scaleup = small system small problem elapsed time
big system big problem elapsed time
Scaling
Databases ◦ Scale up is linear if equation equals 1
Scaling out
Databases
Module Summary
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems • Speedup and Scaleup are often sublinear due to:
Scaling
Databases
◦ Startup costs: Cost of starting up multiple processes may dominate computation time, if the
Scaling out degree of parallelism is high
Databases
◦ Interference: Processes accessing shared resources (e.g., system bus, disks, or locks) compete with
Module Summary each other, thus spending time waiting on other processes, rather than performing useful work
◦ Skew: Increasing the degree of parallelism increases the variance in service times of parallely
executing tasks. Overall execution time determined by slowest of parallely executing tasks
Database Management Systems Partha Pratim Das 58.17
RDBMS Architecture: Parallel Systems: Interconnect
Module 58 • Bus: Components send data on and receive data from a single communication bus
Partha Pratim ◦ Does not scale well with increasing parallelism
Das
• Mesh: Components are arranged as nodes in a grid, and each component is connected
Objectives & to all adjacent components
Outline
◦ Communication links
√ grow with growing number of components,
√ and so scales better
Performance and
Scalability
◦ But may require 2 n hops to send message to a node ( n with wraparound connections at edge)
Performance Factors
& Issues
• Hypercube: Components are numbered in binary; components are connected to one
Architecture another if their binary representations differ in exactly one bit
Centralized & ◦ n components are connected to log n other components and can reach each other via at most log n
Client-Server
Server Systems
links; reduces communication delays
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
• Homogeneous distributed databases
Partha Pratim
Das ◦ Same software/schema on all sites, data may be partitioned among sites
Objectives &
◦ Goal: provide a view of a single database, hiding details of distribution
Outline
• Heterogeneous distributed databases
Performance and
Scalability ◦ Different software/schema on different sites
Performance Factors
& Issues ◦ Goal: integrate existing databases to provide useful functionality
Architecture
Centralized &
• Differentiate between local and global transactions
Client-Server
Server Systems ◦ A local transaction accesses data in the single site at which the transaction was
Parallel Systems
Speedup & Scaleup
initiated
Interconnect ◦ A global transaction either accesses data in a site different from the one at which
Distributed Systems
Scaling
the transaction was initiated or accesses data in several different sites
Databases
Scaling out
Databases
Module Summary
Module 58
• Advantages
Partha Pratim
Das ◦ Sharing data: users at one site able to access the data residing at some other sites
Objectives &
◦ Autonomy: each site is able to retain a degree of control over data stored locally
Outline ◦ Higher system availability through redundancy: data can be replicated at
Performance and
Scalability
remote sites, and system can function even if a site fails
Performance Factors
& Issues • Disadvantages
Architecture
◦ Added complexity required to ensure proper coordination among sites
Centralized &
Client-Server
Server Systems
. Software development cost
Parallel Systems . Greater potential for bugs
Speedup & Scaleup
Interconnect . Increased processing overhead
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Partha Pratim
Das
Objectives &
Outline
Performance and
Scalability
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling Databases
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
• Relational databases → mainstay of business
Partha Pratim
Das • Web-based applications caused spikes explosion of social media sites (Facebook,
Objectives &
Twitter) with large data needs rise of cloud-based solutions such as Amazon S3 (simple
Outline
storage solution)
Performance and
Scalability • Hooking RDBMS to web-based application becomes trouble
Performance Factors
& Issues
• Issues with Scaling Up
Architecture
Centralized & ◦ Best way to provide ACID and rich query model is to have the dataset on a single
Client-Server
Server Systems m/c
Parallel Systems
Speedup & Scaleup
◦ Limits to scaling up (vertical scaling: make a “single” machine more powerful) →
Interconnect dataset is just too big!
Distributed Systems
◦ Scaling out (horizontal scaling: adding more smaller/cheaper servers) is a better
Scaling
Databases ◦ Different approaches for horizontal scaling (multi-node database):
Scaling out
Databases
. Master/Slave
Module Summary
. Sharding (partitioning)
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 58.24
Horizontal Vs. Vertical Scaling PPD
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Module Summary
Module 58
Horizontal Scaling Vertical Scaling
Partha Pratim
Das Advantages
Objectives &
Outline
• Scaling is easier from a hardware per-
Performance and • Cost-effective
Scalability spective
Performance Factors • Less complex process communication
& Issues • Fewer periods of downtime
Architecture • Less complicated maintenance
Centralized & • Increased resilience and fault tolerance
Client-Server
• Less need for software changes
Server Systems • Increased performance
Parallel Systems
Speedup & Scaleup
Disadvantages
Interconnect
Distributed Systems
• Increased complexity of maintenance • Higher possibility for downtime
Scaling
Databases
Scaling out
and operation • Single point of failure
Databases
• Increased Initial costs • Upgrade limitations
Module Summary
Module 58 • Master/Slave
Partha Pratim
Das
◦ All writes are written to the master
◦ All reads performed against the replicated slave databases
Objectives &
Outline ◦ Critical reads may be incorrect as writes may not have been propagated down
Performance and ◦ Large datasets can pose problems as master needs to duplicate data to slaves
Scalability
Performance Factors
& Issues
• Sharding (Partitioning)
Architecture ◦ Scales well for both reads and writes
Centralized &
Client-Server ◦ Not transparent, application needs to be partition-aware
Server Systems
Parallel Systems
◦ Can no longer have relationships/joins across partitions
Speedup & Scaleup ◦ Loss of referential integrity across shards
Interconnect
Distributed Systems • Other Options
Scaling
Databases ◦ Multi-Master replication
Scaling out
Databases ◦ INSERT only, not UPDATES/DELETES
Module Summary ◦ No JOINs, thereby reducing query time → This involves de-normalizing data
◦ In-memory databases
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 58.27
Module Summary
Module 58
Partha Pratim • Evaluated RDBMS, especially with reference to performance and scalability, as a
Das
backbone for data-intensive application development
Objectives &
Outline • Understood the role of system and database architecture in performance
Performance and
Scalability
• Understood the options for scaling databases
Performance Factors
& Issues
Architecture
Centralized &
Client-Server
Server Systems
Parallel Systems
Speedup & Scaleup
Interconnect
Distributed Systems
Scaling
Databases
Scaling out
Databases
Slides used in this presentation are borrowed from [Link] with kind
Module Summary
permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 58.28
Module 59
Partha Pratim
Das
Objectives &
Outline Database Management Systems
What is Big
Data? Module 59: Non-Relational DBMS: NOSQL
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Partha Pratim Das
Types of NOSQL
Databases
Key-value Stores Department of Computer Science and Engineering
Document Stores Indian Institute of Technology, Kharagpur
Column Stores
Graph Stores
ppd@[Link]
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim • Evaluated RDBMS, especially with reference to performance and scalability, as a
Das
backbone for data-intensive application development
Objectives &
Outline • Understood the role of system and database architecture in performance
What is Big
Data?
• Understood the options for scaling databases
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
What is BIG DATA ?
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
• Big data is data sets that are so voluminous and complex that traditional
Partha Pratim
Das
data-processing application software are inadequate to deal with them
Objectives &
• Big data challenges include
Outline ◦ capturing data,
What is Big ◦ data storage,
Data?
◦ data analysis,
What is NOSQL?
The Perfect Storm
◦ search,
◦ sharing,
CAP Theorem
Consistency
◦ transfer,
◦ visualization,
Types of NOSQL
Databases ◦ querying,
Key-value Stores ◦ updating,
Document Stores
◦ information privacy and
Column Stores
Graph Stores
◦ data source
Relational vs.
Non-Relational
• It refers to the use of predictive analytics, user behavior analytics, or certain other
Module Summary
advanced data analytics methods that extract value from big data, and seldom to a
particular size of data set
Database Management Systems Partha Pratim Das 59.7
What is Big Data? PPD
Module 59
• 5V’s (characteristics) of big data:
Partha Pratim
Das ◦ Volume: The quantity of generated and stored data. The size of the data
Objectives &
determines the value and potential insight, and whether it can be considered big
Outline data or not.
What is Big
Data?
◦ Variety: The type and nature of the data. This helps people who analyze it to
What is NOSQL? effectively use the resulting insight. Big data draws from text, images, audio, video;
The Perfect Storm
plus it completes missing pieces through data fusion.
CAP Theorem
Consistency
◦ Velocity: In this context, the speed at which the data is generated and processed
Types of NOSQL
to meet the demands and challenges that lie in the path of growth and
Databases
Key-value Stores
development. Big data is often available in real-time.
Document Stores ◦ Variability: Inconsistency of the data set can hamper processes to handle and
Column Stores
Graph Stores
manage it.
Relational vs. ◦ Veracity: The data quality of captured data can vary greatly, affecting the accurate
Non-Relational
analysis
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
What is NOSQL?
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59 • A NoSQL database provides a mechanism for storage and retrieval of data that is
Partha Pratim modeled in means other than the tabular relations used in relational databases
Das
• NoSQL databases are increasingly used in big data and real-time web applications
Objectives &
Outline
• Such databases have existed since the late 1960s
What is Big
Data? ◦ Network Database Model (NDBMS) is a flexible way of representing objects and
What is NOSQL? their relationships. Its distinguishing feature is that the schema, viewed as a graph
The Perfect Storm
in which object types are nodes and relationship types are arcs, is not restricted to
CAP Theorem
Consistency being a hierarchy or lattice.
Types of NOSQL It was introduced in 1969 and widely replaced by relational databases in the 1980s
Databases
Key-value Stores ◦ Hierarchical Database Model (HDBMS) organizes data into a tree-like
Document Stores
Column Stores
structure. The data are stored as records which are connected to one another
Graph Stores through links. A record is a collection of fields, with each field containing only one
Relational vs.
Non-Relational
value. The type of a record defines which fields the record contains.
Module Summary It was recognized as the first database model in the 1960s and widely replaced by
relational databases in the 1980s
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.10
What is NOSQL? PPD
Module 59 • Stands for Not Only SQL. Also referred as Non-Relational DBSM (NDBMS) and as
Partha Pratim Multi-Model Databases
Das
• ”NoSQL” was coined in the early 21st century, triggered by Web 2.0 companies
Objectives &
Outline
• The term NOSQL was introduced by Carl Strozzi in 1998 for his lightweight Strozzi
What is Big
Data? NoSQL open-source relational database and re-introduced by Eric Evans when an event
What is NOSQL? was organized to discuss open source distributed databases
The Perfect Storm
CAP Theorem
• Eric states that “... but the whole point of seeking alternatives is that you need to
Consistency solve a problem that relational databases are a bad fit for ...”
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
What is NOSQL?
can be partitioned: across partitions
The Perfect Storm ◦ down nodes easily replaced • No declarative query language (like
CAP Theorem
Consistency
◦ no single point of failure SQL) → more programming
Types of NOSQL • horizontal scalable • Relaxed ACID (CAP theorem) →
Databases
Key-value Stores • cheap, easy to implement (open- fewer guarantees
Document Stores
Column Stores source) • No easy integration with other appli-
Graph Stores
• massive write performance cations that support SQL
Relational vs.
Non-Relational
• fast key-value access
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
• The Perfect Storm
Partha Pratim
Das ◦ Large datasets
Objectives &
◦ Acceptance of alternatives, and
Outline ◦ dynamically-typed data
What is Big
Data? has come together in a “perfect storm”
What is NOSQL?
The Perfect Storm
• Not a backlash against RDBMS
CAP Theorem • SQL is a rich query language that cannot be rivaled by the current list of NOSQL
Consistency
offerings
Types of NOSQL
Databases Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
• BigTable (Google): Bigtable: A Distributed Storage System for Structured Data, 2006
Partha Pratim
Das • DynamoDB (Amazon): Amazon’s Dynamo, 2007
Objectives & ◦ Ring partition and replication
Outline
◦ Gossip protocol (discovery and error detection)
What is Big
Data? ◦ Distributed key-value data stores
What is NOSQL? ◦ Eventual consistency: Eventually Consistent - Revisited, 2008. Choosing Consistency, 2010
The Perfect Storm
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
CAP Theorem
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Types of NOSQL
◦ System properties (consistency and/or availability) hold even when network failures
Databases prevent some machines from communicating with others
Key-value Stores
Document Stores ◦ A system can continue to operate in the presence of a network partitions
Column Stores
Graph Stores Source: Introduction to NOSQL Databases, SlidePlayer
Relational vs.
Non-Relational
Module Summary
Module 59
• Brewer’s CAP Theorem
Partha Pratim
Das ◦ For any system sharing data, it is “impossible” to guarantee simultaneously all of
Objectives &
these three properties
Outline ◦ You can have at most two of these three properties for any shared-data system
What is Big
Data? • Very large systems will partition at some point:
What is NOSQL?
The Perfect Storm
◦ That leaves either C or A to choose from (traditional DBMS prefers C over A and
CAP Theorem
P)
Consistency ◦ In almost all cases, you would choose A over C (except in specific applications such
Types of NOSQL
Databases
as order processing) these three properties
Key-value Stores
Source: Introduction to NOSQL Databases, SlidePlayer
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
• All client always have the same view of the data
Partha Pratim
Das • Consistency: Two types:
Objectives & ◦ Strong Consistency: ACID (Atomicity, Consistency, Isolation, Durability)
Outline
◦ Weak Consistency: BASE (Basically Available Soft-state Eventual consistency)
What is Big
Data?
• ACID: A DBMS is expected to support “ACID transactions,” processes that are:
What is NOSQL?
The Perfect Storm ◦ Atomicity: either the whole process is done or none is
CAP Theorem ◦ Consistency: only valid data are written
Consistency
◦ Isolation: one operation at a time
Types of NOSQL
Databases ◦ Durability: once committed, it stays that way
Key-value Stores
Document Stores • CAP
Column Stores
Graph Stores ◦ Consistency: all data on cluster has the same copies
Relational vs. ◦ Availability: cluster always accepts reads and writes
Non-Relational
Module Summary
◦ Partition tolerance: guaranteed properties are maintained even when network
failures prevent some machines from communicating with others
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.20
CAP Theorem (4): Consistency PPD
Module 59 • A consistency model determines rules for visibility and apparent order of updates
Partha Pratim • Example:
Das
◦ Row X is replicated on nodes M and N
Objectives & ◦ Client A writes row X to node N
Outline
◦ Some period of time t elapses
What is Big
Data?
◦ Client B reads row X from node M
◦ Does client B see the write from client A?
What is NOSQL?
The Perfect Storm
◦ Consistency is a continuum with tradeoffs
CAP Theorem
◦ For NOSQL, the answer would be: “maybe”
Consistency
◦ CAP theorem states: “strong consistency can’t be achieved at the same time as availability and
Types of NOSQL
partition-tolerance”
Databases
Key-value Stores
• Eventual consistency
Document Stores
Column Stores
◦ When no updates occur for a long period of time, eventually all updates will
Graph Stores propagate through the system and all the nodes will be consistent
Relational vs.
Non-Relational • Cloud computing
Module Summary
◦ ACID is hard to achieve, moreover, it is not always required, for example, for blogs,
status updates, product listings, etc.
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.21
CAP Theorem (5) PPD
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Types of NOSQL Databases
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
• Key-value Stores: DynamoDB, Voldermort, Scalaris, Redis, MemcacheDB
Partha Pratim
◦ Work by matching keys with values, similar to a dictionary. There is no structure nor relation
Das
• Document Stores: MongoDB, Couchbase/CouchDB
Objectives &
Outline
◦ Work similarly to column-based ones; however, they allow much deeper nesting and complex
structures to be achieved (for example, a document, within a document, within a document)
What is Big
Data? . Documents overcome the constraints of 1 / 2 levels of key / value nesting of columnar databases
What is NOSQL?
The Perfect Storm
• Column Stores: BigTable, Cassandra, Hbased
CAP Theorem ◦ Column-based NoSQL databases are two dimensional arrays whereby each key (that is, row /
Consistency record) has one or more key / value pairs attached to it and these management systems allow very
Types of NOSQL large and un-structured data to be kept and used (for example, a record with tons of information)
Databases
Key-value Stores • Graph Stores: OrientDB, Neo4J, InfoGrid
Document Stores
Column Stores
◦ These use tree-like structures (graphs) with nodes and edges connecting each other through relations
Graph Stores
• Time Series (Discussed in Module 30): InfluxDB, Kdb+, Prometheus, Graphite
Relational vs.
Non-Relational ◦ A time series database (TSDB) is a database optimized for time-stamped or time series data
Module Summary ◦ Measurements or events that are tracked, monitored, downsampled, and aggregated over time
• No-schema and support for flexible data types are common characteristics of most NOSQL systems
Source: Introduction to NOSQL Databases, SlidePlayer
Database Management Systems Partha Pratim Das 59.24
Multi-Model Databases PPD
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019
CAP Theorem
◦ items having one or more attributes (name, value)
Consistency ◦ An attribute can be single-valued or multi-valued like set
Types of NOSQL
◦ Items are combined into a table
Databases
Key-value Stores
• Basic API access:
Document Stores
Column Stores
◦ get(key): extract the value given a key
Graph Stores
◦ put(key, value): create or update the value given its key
Relational vs.
◦ delete(key): remove the key and its associated value
Non-Relational ◦ execute(key, operation, parameters): invoke an operation to the value (given its key) which is
Module Summary a special data structure (e.g. List, Set, Map .... etc)
Source: Introduction to NOSQL Databases, SlidePlayer
Module 59
• Pros:
Partha Pratim
Das ◦ very fast
Objectives &
◦ very scalable (horizontally distributed to nodes based on key)
Outline ◦ simple data model
What is Big
Data?
◦ eventual consistency
What is NOSQL? ◦ fault-tolerance
The Perfect Storm
• Cons:
CAP Theorem
Consistency ◦ Can’t model more complex data structure such as objects
Types of NOSQL
Databases Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Source: Introduction to NOSQL Databases, SlidePlayer
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Source: Introduction to NOSQL Databases, SlidePlayer
Graph Stores
Relational vs.
Non-Relational
Module Summary
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Relational vs. Non-Relational
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019
Database Management Systems Partha Pratim Das 59.39
Database Market Competitive Landscape PPD
Module 59
Partha Pratim
Das
Objectives &
Outline
What is Big
Data?
What is NOSQL?
The Perfect Storm
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational
Module Summary
Source: Database Software Market: The Long-Awaited Shake-up by William Blair, 2019
Database Management Systems Partha Pratim Das 59.40
Module Summary
Module 59
CAP Theorem
Consistency
Types of NOSQL
Databases
Key-value Stores
Document Stores
Column Stores
Graph Stores
Relational vs.
Non-Relational Slides used in this presentation are borrowed from [Link] with kind
Module Summary permission of the authors.
Edited and new slides are marked with “PPD”.
Database Management Systems Partha Pratim Das 59.41
Module 60
Partha Pratim
Das
Widely used
Database Management Systems
RDBMS
Market Share
Module 60: Widely Used DBMSs and Course Summarization
Ranking
Commercial
Free
ORD
Comparative Study Partha Pratim Das
Course Recap
Week 01
Week 02
Department of Computer Science and Engineering
Week 03 Indian Institute of Technology, Kharagpur
Week 04
Week 05 ppd@[Link]
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.1
Module Recap PPD
Module 60
Widely used
• Took a tour of common types of NOSQL database
RDBMS
Market Share • Compared Relational with Non-relational
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.2
Module Objectives PPD
Module 60
Partha Pratim • The space of RDBMSs is crowded. We take a look into widely used RDBMS systems
Das
• We recap the weeks of the course
Obj. & Outl.
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.3
Module Outline PPD
Module 60
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.4
Widely used RDBMS PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study Widely used RDBMS
Course Recap
Week 01
Ref: [Link]
Week 02
Ref: [Link]
Week 03 Ref: [Link]
Week 04 Ref: [Link] us/azure/sql- database/sql- database- develop- cplusplus- simple(Accessed:26-08-2021)
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.5
Relational Databases PPD
Module 60
Partha Pratim • The relational model of data organizes data into one or more tables (or relations) of
Das
rows and columns, with a unique key for each row
Obj. & Outl.
• Since each row in a table has its own unique key, rows in a table can be linked to rows
Widely used
RDBMS in other tables by storing the unique key of the row to which it should be linked (where
Market Share
Ranking
such unique key is known as a foreign key)
Commercial
Free
• Mostly, the relational databases use SQL as the language for querying and maintaining
ORD
Comparative Study
the database
Course Recap • The reasons for the dominance of relational databases are:
Week 01
Week 02
◦ simplicity,
Week 03 ◦ robustness,
Week 04 ◦ flexibility,
Week 05
◦ performance,
Week 06
Week 07
◦ scalability, and
Week 08 ◦ compatibility in managing generic data
Week 09
Week 10 • The RDBMSs are mostly used in large enterprise scenarios
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.6
Widely used RDBMS PPD
Module 60
Partha Pratim
Das
Company Name DBMS Market Share
Obj. & Outl.
Oracle 45.60 %
Widely used
Microsoft 19.10 %
RDBMS
Market Share
IBM 15.70 %
Ranking SAP 9.60 %
Commercial
Free Teradata 3.20 %
ORD
Comparative Study
Others 6.80 %
Course Recap
Week 01
Week 02 Source: DBMS Customers List (Accessed 28-Aug-21)
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.8
DB-Engines Ranking (August 2021): Relational DBMS PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Source: DB-Engines Ranking of Relational DBMS (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.9
DB-Engines Ranking (August 2021):
Trend of Relational DBMS Popularity PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking - Trend of Relational DBMS Popularity (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.10
DB-Engines Ranking (August 2021): Complete PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.11
DB-Engines Ranking (August 2021): Trend Popularity PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11 Source: DB-Engines Ranking - Trend Popularity (Accessed 28-Aug-21)
Week 12
Database Management Systems Partha Pratim Das 60.12
Oracle PPD
Widely used • Latest Version: Oracle Database 19c is the current long term release. Oracle
RDBMS
Market Share
Database 21c is available for production use as an innovation release (August 2021)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads including Oracle Human Capital
ORD
Comparative Study Management (HCM), Oracle Enterprise Resource Planning (ERP), Oracle Customer
Course Recap Experience (CX), Oracle Supply Chain Management (SCM), Oracle Enterprise
Week 01
Week 02
Performance Management (EPM), Oracle Construction and Engineering
Week 03
Week 04
• Languages: Structured Query language (SQL), Procedural SQL (PL-SQL)
Week 05
Week 06
• Tools / Editions: Oracle SQL Developer, Oracle Forms, Oracle Jdeveloper, Oracle
Week 07
Week 08
Reports for development of applications, Oracle Live SQL for test environment
Week 09
Week 10
• Connectivity: Java (JDBC), [Link] ([Link]), C/C++ (OCI, ODBC,
Week 11 ODPI-C), Python (cx Oracle)
Week 12
Database Management Systems Partha Pratim Das 60.13
Db2 PPD
Module 60
Partha Pratim • Db2 contains database-server products developed by IBM. Mostly relational models,
Das
but now includes object relational models
Obj. & Outl.
• In 1970, Edgar [Link], researcher in IBM published the model for data manipulation.
Widely used
RDBMS
Market Share
• Latest Version: Db2 11.5 (June 2019)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads
ORD
Comparative Study • Languages: Structured Query language (SQL), XML Query
Course Recap
Week 01
• Tools / Editions: Advanced Enterprise Server Edition, Enterprise Server Edition,
Week 02
Week 03
Advanced Workgroup Server Edition, Workgroup Server Edition, Direct and Developer
Week 04 Editions and Express-C.
Week 05
Week 06 • Connectivity: C/C++, Java, Ruby, Perl through a package of DB2 API’s
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.14
SQL Server PPD
Module 60
Widely used
• Latest Version: Microsoft SQL Server 2019 (November 2019)
RDBMS
Market Share • Application Domains: Online Transaction Processing (OLTP) and Online Analytical
Ranking
Commercial
Processing (OLAP)
Free
ORD
• Languages: Transact SQL
Comparative Study
• Tools / Editions: Enterprise, Standard, Web, Business Intelligence, WorkGroup, Express
Course Recap
Week 01 • Connectivity: Java (JDBC), C/C++ (ODBC)
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.15
Sybase PPD
Module 60
Partha Pratim • Relational model database server product for businesses developed by Sybase
Das
Corporation which became part of SAP AG.
Obj. & Outl.
• Originally meant for Unix platforms in 1987, Sybase Corporation’s primary DBMS
Widely used
RDBMS product was initially marketed under the name Sybase SQL Server.
Market Share
Ranking • Latest Version: SAP ASE 16 (April 2014)
Commercial
Free • Languages: Sybase IQ, Transact-SQL
ORD
Comparative Study • Tools / Editions: Sybase SQL server for development of applications. Has a developer
Course Recap and express edition.
Week 01
Week 02
Week 03
• Connectivity: C/C++ (SQLAPI++), Java (JDBC)
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.16
Teradata PPD
Module 60
Partha Pratim • Relational database management system developed by Caltech and Citibank’s
Das
advanced technology group
Obj. & Outl.
• In 1984, the first version of Teradata was released
Widely used
RDBMS
Market Share
• Latest Version: Teradata [Link] (August 2021)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads
ORD
Comparative Study • Languages: BTEQ (Basic Teradata Query)
Course Recap
Week 01
• Tools / Editions: Developer Edition, Express Edition
Week 02
Week 03 • Connectivity: Java (JDBC), C/C++ (ODBC)
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.17
PostgreSQL PPD
Module 60
Partha Pratim • Open source relational database management system produced by PostgreSQL Global
Das
Development Group, a diverse group of many companies and individual contributors.
Obj. & Outl.
• First version in 1988 by researchers of POSTGRES project
Widely used
RDBMS
Market Share
• Latest Version: PostgreSQL 14.0 (June, 2021)
Ranking ◦ For this course, we using PostgreSQL 10.18 (Download Link)
Commercial
Free • Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
ORD
Comparative Study
and Mixed (OLTP & DW) database workloads, Supports Big Data Analytics
Course Recap • Languages: Structured Query language (SQL), Procedural SQL (PL- SQL)
Week 01
Week 02 • Connectivity: Java (JDBC), [Link] (npgsql), C/C++ (libpq), Python
Week 03
Week 04 (psycopg2 and several others)
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.18
MySQL PPD
Module 60
Partha Pratim • Open source relational database management system produced by Swedish company
Das
MySQL AB, owned by Oracle Corporation
Obj. & Outl.
• First internal release on 23 May 1995
Widely used
RDBMS
Market Share
• Latest Version: MySQL 8.0.26 (July 2021)
Ranking
Commercial
• Application Domains: Online Transaction Processing (OLTP), Data Warehousing (DW)
Free and Mixed (OLTP & DW) database workloads
ORD
Comparative Study • Languages: Structured Query language (SQL), Procedural SQL (PL- SQL)
Course Recap
Week 01
• Connectivity: Java (JDBC), [Link] ([Link]), C/C++ (ODBC)
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.19
SQLite PPD
Widely used
• It is an RDBMS contained in a C library and is not a client–server database engine.
RDBMS Rather, it is embedded into the end program
Market Share
Ranking • It is supported by an international team of developers who work on SQLite full-time
Commercial
Free • First release on 29 May 2000
ORD
Comparative Study • Latest Version: SQLite 3.36.0 (June 2021)
Course Recap ◦ For Application Development course, we are going to use SQLite. Check version from Instructor
Week 01
Week 02 • Application Domains:
Week 03
Week 04
◦ Photoshop Lightroom (Adobe), A350 XWB family of aircraft (Airbus), GM, Nissan, and Suzuki
Week 05 automobiles (Bosch), Dropbox, osquery (Facebook), Android cell-phone OS and Chrome Web
Week 06 Browser (Google), Library of Congress, McAfee, Firefox, etc.
Week 07
Week 08 • Languages: Structured Query language (SQL)
Week 09
Week 10 • Connectivity: Java (JDBC), [Link] ([Link]), C/C++ (SQLite
Week 11
Week 12
C/C++ Interface), Python (sqlite3)
Database Management Systems Partha Pratim Das 60.20
Object–Relational Database (ORD) or
Object–RDBMS (ORDBMS) PPD
Module 60
• Combines database capabilities with object oriented programming language capabilities
Partha Pratim
Das • Objects have a many to many relationship and are accessed by the use of pointers
Obj. & Outl. • Access to data can be faster because an object can be retrieved directly without a
Widely used search, by following pointers
RDBMS
Market Share
• Most object databases also offer some kind of query language, allowing objects to be
Ranking
Commercial found using a declarative programming approach
Free
ORD • Examples:
Comparative Study
◦ Illustra: A commercialized version of the Postgres ORD. It was sold to Informix Corp. in 1997,
Course Recap
folded into the Informix 7 Product Line, eventually sold to IBM
Week 01
Week 02
◦ Objectivity/DB: It is a commercial ORD by Objectivity, Inc. It allows applications to make
Week 03 standard C++, C#, Java, or Python objects persistent without having to convert the data objects
Week 04 into the rows and columns used by a RDBMS. It supports OO languages, SQL/ODBC and XML
Week 05
Week 06
◦ SQL:1999: Many of the ideas of early ORD efforts have largely become incorporated into
Week 07 SQL:1999 via structured types. Any product compliant to OO features of SQL:1999 could be
Week 08 described as an ORD product. For example, Db2, Oracle, and SQL Server, make claims to support
Week 09
this technology and do so with varying degrees of success
Week 10
Week 11 Source: Object–relational database, Object database
Week 12
Database Management Systems Partha Pratim Das 60.21
Parameters PPD
Module 60
Module 60
Partha Pratim
OS support
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.23
Comparative Study PPD
Module 60
Partha Pratim
Basic Features
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.24
Comparative Study PPD
Module 60 Limits
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.25
Comparative Study PPD
Module 60
Partha Pratim
Tables and Views
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Type System
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.26
Comparative Study PPD
Module 60
Data Types
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.27
Comparative Study PPD
Module 60
Partha Pratim
Indexes
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.28
Comparative Study PPD
Module 60
Partha Pratim
Database Capabilities
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.29
Comparative Study PPD
Module 60
Partha Pratim
Other Objects
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.30
Comparative Study PPD
Module 60
Partha Pratim
Partitioning
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.31
Comparative Study PPD
Module 60
Partha Pratim
Access Control
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.32
Course Recap PPD
Module 60
Partha Pratim
Das
Widely used
RDBMS
Market Share
Ranking
Commercial
Free
ORD
Comparative Study
Course Recap
Week 01
Week 02
Week 03
Course Recap
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.33
Week 01: Course Overview and Introduction to DBMS PPD
Module 60
• Module 01: Course Overview • Module 04: Introduction to DBMS/1
Partha Pratim
Das
◦ Why Databases? ◦ Levels of Abstraction
Obj. & Outl. ◦ KYC: Know Your Course ◦ Schema and Instance
◦ Data Models
Widely used
RDBMS
• Module 02: Why DBMS?/1 ◦ DDL and DML
Market Share ◦ Evolution of Data Management ◦ SQL
Ranking
◦ History of DBMS ◦ Database Design
Commercial
Free • Module 03: Why DBMS?/2 • Module 05: Introduction to DBMS/2
ORD
Comparative Study ◦ File Systems vs Databases ◦ Database Design
Course Recap ◦ Database Engine
Week 01 ◦ Database Users and Administrators
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.34
Week 02: Introduction to Relational Model and SQL PPD
Module 60
• Module 06: Introduction to Relational • Module 09: Introduction to SQL/2
Partha Pratim
Das Model/1 ◦ Additional Basic Operations
Obj. & Outl.
◦ Example of a Relation . Cartesian Product
Widely used
◦ Attributes . Rename AS
RDBMS ◦ Schema and Instance . String Values
Market Share ◦ Keys . Order By Clause
Ranking
Commercial
◦ Relational Query Languages . Select Top/Fetch Clause
Free • Module 07: Introduction to Relational . Where Clause Predicates
ORD
Model/2 . Duplicates
Comparative Study
Module 60
• Module 11: SQL Examples • Module 13: Intermediate SQL/2
Partha Pratim
Das ◦ Cartesian Product ◦ Join Expressions
Obj. & Outl.
◦ Rename AS ◦ Views
◦ Where AND/OR • Module 14: Intermediate SQL/3
Widely used
RDBMS
◦ String Values
Market Share ◦ Order By Clause ◦ Transactions
Ranking ◦ in ◦ Integrity Constraints
Commercial
◦ Set Operations ◦ SQL Data Types and Schemas
Free
ORD ◦ Aggregation Operations ◦ Authorization
Comparative Study
• Module 12: Intermediate SQL/1 • Module 15: Advanced SQL
Course Recap
Week 01 ◦ Nested Subqueries ◦ Functions and Procedural Constructs
Week 02 ◦ Modification of the Database ◦ Triggers
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.36
Week 04: Relational Query and Modelling PPD
Module 60
• Module 16: Formal Relational Query • Module 19: Entity-Relationship Model/2
Partha Pratim
Das Languages/1 ◦ ER Diagram
Obj. & Outl.
◦ Relational Algebra ◦ ER Model to Relational Schema
Widely used • Module 17: Formal Relational Query • Module 20: Entity-Relationship Model/3
RDBMS Languages/2
Market Share
◦ ER Features
Ranking ◦ Predicate Logic
Commercial
Free
◦ Tuple Relational Calculus
ORD
◦ Domain Relational Calculus
Comparative Study ◦ Equivalence of Algebra and Calculus
Course Recap
Week 01
• Module 18: Entity-Relationship Model/1
Week 02 ◦ Design Process
Week 03
Week 04
◦ ER Model
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.37
Week 05: RDBMS Design: Dependency and Normal Forms PPD
Module 60
• Module 21: Relational Database Design/1 • Module 24: Relational Database Design/4
Partha Pratim
Das ◦ Features of Good Relational Design ◦ Algorithms for Functional Dependencies
Obj. & Outl.
◦ Atomic Domains and First Normal Form • Module 25: Relational Database Design/5
Widely used • Module 22: Relational Database Design/2 ◦ Lossless Join Decomposition
RDBMS
Market Share
◦ Functional Dependencies ◦ Dependency Preservation
Ranking
Commercial
• Module 23: Relational Database Design/3
Free
ORD
◦ Functional Dependency Theory
Comparative Study
◦ Decomposition Using Functional
Course Recap
Dependencies
Week 01
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.38
Week 06: RDBMS Design: Dependency and Normal Forms (2) PPD
Module 60
• Module 26: Relational Database Design/6: • Module 29: Relational Database Design/9:
Partha Pratim
Das Normal Forms MVD and 4NF
Module 60
• Module 31: Application Design and • Module 34: Application Design and
Partha Pratim Development/4: Python and PostgreSQL
Das Development/1: Architecture
◦ Application Programs and Architecture ◦ PostgreSQL and Python
Obj. & Outl.
◦ Python Frameworks for PostgresSQL
Widely used • Module 32: Application Design and ◦ Flask
RDBMS
Development/2: Web Applications
Market Share
• Module 35: Application Design and
Ranking ◦ WWW Development/5: Application Development
Commercial
Free
◦ Scripting and Mobile
ORD
Comparative Study
• Module 33: Application Design and ◦ Rapid Application Development
Development/3: SQL and Native Language ◦ Application Performance and Security
Course Recap
Week 01 ◦ SQL and Native Language ◦ Challenges in Web Application
Week 02
◦ ODBC Development
Week 03
Week 04
◦ JDBC ◦ Mobile Apps
Week 05 ◦ Bridge
Week 06 ◦ Embedded SQL
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.40
Week 08: Storage Management PPD
Module 60
• Module 36: Algorithms and Data • Module 39: Storage and File Structure/1:
Partha Pratim
Das Structures/1: Algorithms and Complexity Physical Storage
Analysis ◦ Overview of Physical Storage Media
Obj. & Outl.
◦ Algorithms ◦ Magnetic Disk
Widely used
RDBMS ◦ Analysis of Algorithms ◦ Magnetic Tapes
Market Share ◦ Complexity Chart ◦ Cloud Storage
Ranking
• Module 37: Algorithms and Data ◦ Other Storage
Commercial
Free Structures/2: Data Structures/1
◦ Future of Storage
ORD
• Module 40: Storage and File Structure/2:
Comparative Study ◦ Data Structures File Structure
Course Recap ◦ Linear Data Structures
Week 01 ◦ Linear and Binary Search ◦ File Organization
Week 02
• Module 38: Algorithms and Data ◦ Organization of Records in Files
Week 03
◦ Data Dictionary Storage
Week 04 Structures/3: Data Structures/2
Week 05 ◦ Storage Access
Week 06 ◦ Data Structures
Week 07
Week 08
◦ Non-linear Data Structures
Week 09
◦ Binary Search Tree
Week 10 ◦ Comparison
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.41
Week 09: Indexing and Hashing PPD
Module 60
• Module 41: Indexing and Hashing/1: • Module 44: Indexing and Hashing/4: Hashing
Partha Pratim
Das Indexing/1 ◦ Static Hashing
Obj. & Outl.
◦ Concepts of Indexing ◦ Dynamic Hashing
◦ Ordered Indices ◦ Comparison Schemes
Widely used
◦ Bitmap Indices
RDBMS
• Module 42: Indexing and Hashing/1:
Market Share
Ranking
Indexing/2 • Module 45: Indexing and Hashing/5: Index
Design
Commercial
◦ Balanced Binary Search Trees
Free
ORD ◦ 2-3-4 Tree ◦ Index Definition in SQL
Comparative Study ◦ Guidelines for Indexing
• Module 43: Indexing and Hashing/1:
Course Recap
Week 01
Indexing/3
Week 02
◦ B+ -Tree Index Files
Week 03
Week 04
◦ B-Tree Index Files
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.42
Week 10: Transactions Management PPD
Module 60
• Module 46: Transactions/1 • Module 49: Concurrency Control/1
Partha Pratim
Das ◦ Transaction Concept ◦ Concurrency Control
Obj. & Outl.
◦ Transaction States ◦ Lock-Based Protocols
◦ Concurrent Executions ◦ Implementation of Locking
Widely used
RDBMS • Module 47: Transactions/2: Serializability • Module 50: Concurrency Control/2
Market Share
Ranking ◦ Serializability ◦ Deadlock Handling
Commercial
◦ Conflict Serializability ◦ Timestamp-Based Protocols
Free
ORD • Module 48: Transactions/3: Recoverability
Comparative Study
Course Recap
◦ Recovery
Week 01 ◦ Transaction Definition in SQL
Week 02 ◦ View Serializability
Week 03
Week 04
◦ Complex Notions of Serializability
Week 05
Week 06
Week 07
Week 08
Week 09
Week 10
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.43
Week 11: Backup and Recovery PPD
Module 60
• Module 51: Backup and Recovery/1: • Module 53: Backup and Recovery/3:
Partha Pratim
Das
Backup/1 Recovery/2
Module 60
• Module 56: Query Processing and • Module 59: Non-Relational DBMS:
Partha Pratim NOSQL
Das
Optimization/1: Processing
◦ Query Processing ◦ What is Big Data?
Obj. & Outl.
◦ Query Cost ◦ What is NOSQL?
Widely used
RDBMS
◦ Selection Operation ◦ CAP Theorem
Market Share ◦ Sorting ◦ Types of NOSQL Databases
Ranking ◦ Join Operation ◦ Relational vs. Non-Relational
Commercial
Free
◦ Other Operations • Module 60: Widely used DBMSs and
ORD
Comparative Study
• Module 57: Query Processing and Summarization
Optimization/2: Optimization ◦ Widely used RDBMSs
Course Recap
Week 01 ◦ Introduction to Query Optimization ◦ Course Recap
Week 02 ◦ Transformation of Relational Expressions
Week 03
Week 04 • Module 58: RDBMS Performance and
Week 05
Architecture
Week 06
Week 07 ◦ RDBMS Performance and Scalability
Week 08
Week 09
◦ RDBMS Architecture
Week 10 ◦ Scaling Databases
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.45
Final Words PPD
Module 60
Partha Pratim • Read the DBMS Text book thoroughly and solve exercises
Das
• Practice query coding
Obj. & Outl.
Widely used
• Practice database design from specs
RDBMS
Market Share • Besides DBMS, develop good knowledge in programming, data structure, algorithms
Ranking
Commercial
and discrete structures
Free
ORD
• Seek help, if you need to – mail us
Comparative Study
• To learn more online you may refer to the resources mentioned in: What is the best
Course Recap
Week 01
possible way to learn DBMS online ?
Week 02
Week 03
Week 04
Week 05
Week 06
Week 07
Slides used in this presentation are borrowed from [Link] with kind
Week 08 permission of the authors.
Week 09
Week 10 Edited and new slides are marked with “PPD”.
Week 11
Week 12
Database Management Systems Partha Pratim Das 60.46