Relational Model
The relational model uses a collection of tables to represent both data &
the relationships among those data.
Each table has multiple columns, and column has a unique name. Each
attribute of a relation has a name.
The set of allowed values for each attribute is called the domain of the
attribute.
Attribute values are required to be atomic; that is, indivisible
- E.g. the value of an attribute can be an account number, but cannot be a
set of account numbers
Domain is said to be atomic if all its members are atomic
The special value null is a member of every domain. The null value causes
complications in the definition of many operations
Relational Model…contd.
Database schema: The database schema is a logical design of database
A database consists of multiple relations
Information about an enterprise is broken up into parts, with each relation
storing one part of the information
E.g. account : information about accounts
depositor : which customer owns which account
customer : information about customers
The customer Relation Depositor Relation
Relational Model…contd.
Because tables are relations, we use mathematical terms relation and tuple in
place of table and row
Comparison of database terms with programming language terms
The concept of relation corresponds to the programming language notion of the
variable.
The concept of relation schema corresponds to the programming language notion
of the type definition.
The concept of relation instance corresponds to the programming language notion
of a value of the variable.
Relational Model…contd.
Relation Schema
• Formally, given domains D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x Dn
Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
• A relation is a subset of cartesian product of a list of domains
• Schema of a relation consists of
attribute definitions
- name
- type/domain
integrity constraints
Relation Instance
The current values (relation instance) of a relation are specified by a table
An element t of r is a tuple, represented by a row in a table
attributes
(or columns)
customer_name customer_street customer_city
Jones Main Harrison
Smith North Rye tuples
Curry North Rye (or rows)
Lindsay Park Pittsfield
customer
Relational Model…contd.
Attribute Types
Each attribute of a relation has a name
The set of allowed values for each attribute is called the domain of the
attribute
Attribute values are normally required to be atomic; that is, indivisible
- E.g. the value of an attribute can be an account number, but cannot be a
set of account numbers
Domain is said to be atomic if all its members are atomic
The special value null is a member of every domain
The null value causes complications in the definition of many operations
Relational Model…contd.
Why Split Information Across Relations
Storing all information as a single relation such as
bank(account_number, balance, customer_name, ..) results in
- repetition of information
e.g., if two customers own an account (What gets repeated?)
- the need for null values
e.g., to represent a customer without an account
Normalization theory deals with how to design relational schemas
Relational Model…contd.
Keys
Let K R K is a superkey of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R) by “possible r ” we mean a relation r
that could exist in the enterprise we are modeling.
• Example:{customer_name,customer_street} and
{customer_name}
are both superkeys of Customer, if no two customers can possibly have the
same name
• In real life, an attribute such as customer_id would be used instead of
customer_name to uniquely identify customers, but we omit it to keep our
examples small, and instead assume customer names are unique.
• A superkey is a set of one or more attributes that, taken collectively identify
uniquely a tuple in the relation.
Relational Model…contd.
Banking Enterprise
Relational Model…contd.
K is a candidate key if K is minimal
Example: {customer_name} is a candidate key for Customer, since it is a superkey and no
subset of it is a superkey.
Primary key: a candidate key chosen as the principal means of identifying tuples within
a relation
• Should choose an attribute whose value never, or very rarely, changes.
• E.g. email address is unique, but may change
Foreign Key: A relation schema may have an attribute that corresponds to the
primary key of another relation. The attribute is called a foreign key.
E.g. customer_name and account_number attributes of depositor are foreign keys to
customer and account respectively.
- Only values occurring in the primary key attribute of the referenced relation may
occur in the foreign key attribute of the referencing relation.
Relational Model…contd.
Query Languages
A query language is a language in which a user requests information
from the database.
In procedural language the user interacts the system to perform a
sequence of operations on the database to perform result.
The relational algebra is a pure procedural language.
Relational algebra gives formal foundation for relational model
operations
It is used as basis for implementing and optimizing queries in the query
processing & optimization model
Relational Model…contd.
Some of the relational algebra concepts are incorporated into the
SQL standard query language for RDBMS.
The basic set of operations for relational model is the relational
algebra.
A sequence of relation algebra operations forms a relational
algebra expression
The result of a retrieval is a new relation which may have been
formed from one or more relations
The fundamental operations in the relational algebra are Select,
Project, Union, Setdifference, Cartesian product and Rename.
Relational Model…contd.
Procedural language
Six basic operators
select:
project:
union:
set difference: –
Cartesian product: x
rename:
The operators take one or two relations as inputs and produce a new relation as a
result.
Select Operation
A B C D
Relation r
1 7
5 7
12 3
23 10
A=B ^ D > 5 (r)
A B C D
1 7
23 10
Project Operation
A B C
Relation r:
10 1
20 1
30 1
40 2
A,C (r) A C A C
1 1
1 = 1
1 2
2
Union Operation
Relations r, s: A B A B
1 2
2 3
1 s
r
A B
r s: 1
2
1
3
Set Difference Operation
Relations r, s: A B A B
1 2
2 3
1 s
r
r – s: A B
1
1
Cartesian-Product Operation
Relations r, s: A B C D E
1 10 a
10 a
2
20 b
r 10 b
s
r x s: A B C D E
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
Rename Operation
• Allows us to name, and therefore to refer to, the results of
relational-algebra expressions.
• Allows us to refer to a relation by more than one name.
• Example:
x (E)
returns the expression E under the name X
• If a relational-algebra expression E has arity n, then
x ( A ,A ,...,A ) (E )
1 2 n
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .
Relational Algebra Queries:
Find all loans of over Rs 1200
Find the loan number for each loan of an amount greater than Rs1200
Find the names of all customers who have a loan, an account, or both, from the bank
Relational Algebra Queries:
Find all loans of over Rs1200
amount > 1200 (loan)
Find the loan number for each loan of an amount greater than Rs1200
loan_number (amount > 1200 (loan))
Find the names of all customers who have a loan, an account, or both, from the bank
customer_name (borrower) customer_name (depositor)
Project Operation
Find the names of all customers who have a loan at the Perryridge branch.
Find the names of all customers who have a loan at the Perryridge branch but do not
have an account at any branch of the bank.
Project Operation
Find the names of all customers who have a loan at the Perryridge branch.
customer_name (branch_name=“Perryridge” (borrower.loan_number =
loan.loan_number(borrower x loan)))
Find the names of all customers who have a loan at the Perryridge branch but do not
have an account at any branch of the bank.
customer_name (branch_name = “Perryridge” (borrower.loan_number = loan.loan_number(borrower x
loan))) – customer_name(depositor)
Project Operation
Find the names of all customers who have a loan at the Perryridge branch
Project Operation
Find the names of all customers who have a loan at the Perryridge branch
customer_name (branch_name = “Perryridge” (borrower.loan_number =
loan.loan_number (borrower x loan)))
customer_name(loan.loan_number = borrower.loan_number ((branch_name =
“Perryridge” (loan)) x borrower))
Set-Intersection Operation
Additional Operations
– Set intersection ()
– Natural join ( )
– Aggregation
– Outer Join
– Division
Set-Intersection Operation
A B A B
Relation r, s: 1 2
2 3
1
r s
A B
2
• rs
Natural Join Operation
• Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s
A B C D E
1 a
1 a
1 a
1 a
2 b
Natural-Join Operation
Notation: r s
Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R S obtained as follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the attributes in R S, add a
tuple t to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
Example:
R = (A, B, C, D)
S = (E, B, D)
• Result schema = (A, B, C, D, E)
• r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join
Find the name of all customers who have a loan at the bank and the
loan amount
Natural Join
• Find the name of all customers who have a loan at the bank and the loan
amount
customer_name, loan_number, amount (borrower loan)
Project Operation
• Find all customers who have an account from at least the “Downtown” and the
Uptown” branches.
Project Operation
• Find all customers who have an account from at least the “Downtown” and the
Uptown” branches.
customer_name (branch_name = “Downtown” (depositor account ))
customer_name (branch_name = “Uptown” (depositor account))
Project Operation
• Find the largest account balance.
Project Operation
• Find the largest account balance
– Strategy:
• Find those balances that are not the largest
– Rename account relation as d so that we can compare each account
balance with all others
• Use set difference to find those account balances that were not found in
the earlier step.
balance(account) - account.balance (account.balance < d.balance (account x d (account)))
Aggregate Functions and Operations
Aggregate functions that summarize data from tables
Aggregation function takes a collection of values and returns a single value as
a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra
G1,G2 ,,Gn
F ( A ),F ( A ,,F ( A ) (E )
1 1 2 2 n n
E is any relational-algebra expression
– G1, G2 …, Gn is a list of attributes on which to group
– Each Fi is an aggregate function
– Each Ai is an attribute name
Aggregate Operation
• Relation r:
A B C
7
7
3
10
g sum(c) (r) sum(c )
27
Aggregate Operation
• Relation account grouped by branch-name:
branch_name account_number balance
Perryridge A-102 400
Perryridge A-201 900
Brighton A-217 750
Brighton A-215 750
Redwood A-222 700
branch_name g sum(balance) (account)
branch_name sum(balance)
Perryridge 1300
Brighton 1500
Redwood 700
Outer Join
• An extension of the join operation that avoids loss of information.
• Computes the join and then adds tuples form one relation that does not match tuples in
the other relation to the result of the join.
Loan Borrower
Example: loan_number branch_name amount customer_name loan_number
L-170 Downtown 3000 Jones L-170
L-230 Redwood 4000 Smith L-230
L-260 Perryridge 1700 Hayes L-155
Join loan borrower loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
Left Outer Join loan borrower
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
Outer Join
Right Outer Join
loan borrower
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-155 null null Hayes
Full Outer Join
loan borrower
loan_number branch_name amount customer_name
L-170 Downtown 3000 Jones
L-230 Redwood 4000 Smith
L-260 Perryridge 1700 null
L-155 null null Hayes
Null Values
It is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The result of any arithmetic expression involving null is null.. Aggregate functions
simply ignore null values (as in SQL)
For duplicate elimination and grouping, null is treated like any other value, and two
nulls are assumed to be the same (as in SQL) Comparisons with null values return the
special truth value: unknown
If false was used instead of unknown, then not (A < 5)
would not be equivalent to A >= 5
Three-valued logic using the truth value unknown:
OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
NOT: (not unknown) = unknown
Division Operation
Notation: r s
Suited to queries that include the phrase “for all”.
Let r and s be relations on schemas R and S respectively where
– R = (A1, …, Am , B1, …, Bn )
– S = (B1, …, Bn)
The result of r s is a relation on schema
R – S = (A1, …, Am)
r s = { t | t R-S (r) u s ( tu r ) }
Where tu means the concatenation of tuples t and u to produce a single
tuple
Division Operation
Relations r, s:
A B B
1 1
2
3 2
1 s
1
1
3
4
6
1
2
r s: A r
Division Operation
Relations r, s:
A B C D E D E
a a 1 a 1
a a 1 b 1
a b 1 s
a a 1
a b 3
a a 1
a b 1
a b 1
r
r s:
A B C
a
a
Division Operation
• Property
– Let q = r s
– Then q is the largest relation satisfying q x s r
• Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S R
r s = R-S (r ) – R-S ( ( R-S (r ) x s ) – R-S,S(r ))
To see why
– R-S,S (r) simply reorders attributes of r
– R-S (R-S (r ) x s ) – R-S,S(r) ) gives those tuples t in
R-S (r ) such that for some tuple u s, tu r.
• Find all customers who have an account at all branches located in
Brooklyn city .
Natural Join and Division
• Find all customers who have an account at all branches located in Brooklyn
city.
customer_name, branch_name (depositor account) branch_name (branch_city = “Brooklyn”
(branch))
Natural Join
• Find all customers who have an account from at least the “Downtown” and the
Uptown” branches.
customer_name (branch_name = “Downtown” (depositor account ))
customer_name (branch_name = “Uptown” (depositor account))
customer_name, branch_name (depositor account)
temp(branch_name) ({(“Downtown” ), (“Uptown” )})
• Find all customers who have an account at all branches located in
Brooklyn city
customer_name, branch_name (depositor account)
branch_name (branch_city = “Brooklyn” (branch))