Relational Database Concepts
Relational Database Concepts
Model
• Structure of Relational Databases
• Relational Algebra
• Tuple Relational Calculus
• Domain Relational Calculus
• Extended Relational-Algebra-
Operations
• Modification of the Database
• Views
Example of a Relation
Basic Structure
• Formally, given sets D1, D2, …. Dn a relation r is a subset of
D1 x D2 x … x Dn
Thus a relation is a set of n-tuples (a1, a2, …, an) where
ai Di
• Example: if
customer-name = {Jones, Smith, Curry, Lindsay}
customer-street = {Main, North, Park}
customer-city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield)}
is a relation over customer-name x customer-street x
customer-city
Attribute Types
• Each attribute of a relation has a name
• The set of allowed values for each attribute is called
the domain of the attribute
• Attribute values are (normally) required to be atomic,
that is, indivisible
• E.g. multivalued attribute values are not atomic
• E.g. composite attribute values are not atomic
• The special value null is a member of every domain
• The null value causes complications in the definition of
many operations
• we shall ignore the effect of null values in our main
presentation and consider their effect later
Relation Schema
• A1, A2, …, An are attributes
• R = (A1, A2, …, An ) is a relation schema
E.g. Customer-schema =
(customer-name, customer-street,
customer-city)
• r(R) is a relation on the relation schema R
E.g. customer (Customer-schema)
Relation Instance
• The current values (relation instance) of a
relation are specified by a table
• An element t of r is a tuple, represented by a row
in a table attributes
customer
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
E.g. account relation with unordered tuples
Database
• A database consists of multiple relations
• Information about an enterprise is broken up into parts,
with each relation storing one part of the information
E.g.: account : stores information about accounts
depositor : stores information about which
customer owns which account
customer : stores information about customers
• Storing all information as a single relation such as
bank(account-number, balance, customer-name, ..)
results in
• repetition of information (e.g. two customers own an account)
• the need for null values (e.g. represent a customer without an
account)
• Normalization theory (Chapter 7) deals with how to design
relational schemas
The customer Relation
The depositor Relation
E-R Diagram for the Banking
Enterprise
Keys
• Let K R
• K is a superkey of R if values for K are sufficient to
identify a unique tuple of each possible relation r(R) by
“possible r” we mean a relation r that could exist in the
enterprise we are modeling.
Example: {customer-name, customer-street} and
{customer-name}
are both superkeys of Customer, if no two customers
can possibly have the same name.
• K is a candidate key if K is minimal
Example: {customer-name} is a candidate key for
Customer, since it is a superkey {assuming no two
customers can possibly have the same name), and no
subset of it is a superkey.
Determining Keys from E-R
Sets
• Strong entity set. The primary key of the entity set
becomes the primary key of the relation.
• Weak entity set. The primary key of the relation
consists of the union of the primary key of the strong
entity set and the discriminator of the weak entity set.
• Relationship set. The union of the primary keys of the
related entity sets becomes a super key of the
relation.
• For binary many-to-one relationship sets, the primary key of
the “many” entity set becomes the relation’s primary key.
• For one-to-one relationship sets, the relation’s primary key
can be that of either entity set.
• For many-to-many relationship sets, the union of the primary
keys becomes the relation’s primary key
Schema Diagram for the Banking Enterprise
Query Languages
• Language in which user requests information from
the database.
• Categories of languages
• procedural
• non-procedural
• “Pure” languages:
• Relational Algebra
• Tuple Relational Calculus
• Domain Relational Calculus
• Pure languages form underlying basis of query
languages that people use.
Relational Algebra
• Procedural language
• Six basic operators
• select
• project
• union
• set difference
• Cartesian product
• rename
• The operators take two or more relations as inputs
and give a new relation as a result.
Select Operation –
Example
• Relation r A B C D
1 7
5 7
12 3
23 10
1 7
23 10
Select Operation
• Notation: p(r)
• p is called the selection predicate
• Defined as:
p(r) = {t | t r and p(t)}
Where p is a formula in propositional calculus
consisting of terms connected by : (and), (or),
(not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <.
• Example of selection:
branch-name=“Perryridge”(account)
Project Operation –
Example
• Relation r: A B C
10 1
20 1
30 1
40 2
A,C (r) A C A C
1 1
1 = 1
1 2
2
Project Operation
• Notation:
A1, A2, …, Ak (r)
where A1, A2 are attribute names and r is a relation
name.
• The result is defined as the relation of k columns
obtained by erasing the columns that are not listed
• Duplicate rows removed from result, since relations are
sets
• E.g. To eliminate the branch-name attribute of account
account-number, balance (account)
Union Operation – Example
• Relations r, s:
A B A B
1 2
2 3
1 s
r
r s:
A B
1
2
1
3
Union Operation
• Notation: r s
• Defined as:
r s = {t | t r or t s}
• For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (e.g., 2nd column
of r deals with the same type of values as does the 2nd
column of s)
• E.g. to find all customers with either an account or a loan
1 2
2 3
1 s
r
r – s:
A B
1
1
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t r and t s}
• Set differences must be taken between compatible
relations.
• r and s must have the same arity
• attribute domains of r and s must be compatible
Cartesian-Product Operation-
Example
Relations r, s: A B C D E
1 10 a
10 a
2 20 b
r 10 b
s
r x s:
A B C D E
1 10 a
1 10 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
Cartesian-Product
Operation
• Notation r x s
• Defined as:
r x s = {t q | t r and q s}
• Assume that attributes of r(R) and s(S) are disjoint.
(That is,
R S = ).
• If attributes of r(R) and s(S) are not disjoint, then
renaming must be used.
Composition of Operations
• Can build expressions using multiple operations
• Example: A=C(r x s)
•rxs A B C D E
1 10 a
1 19 a
1 20 b
1 10 b
2 10 a
2 10 a
2 20 b
2 10 b
A B C D E
1 10 a
• A=C(r x s) 2 20 a
2 20 b
Rename Operation
• Allows us to name, and therefore to refer to, the
results of relational-algebra expressions.
• Allows us to refer to a relation by more than one name.
Example:
x (E)
returns the expression E under the name X
If a relational-algebra expression E has arity n, then
x (A1, A2, …, An) (E)
returns the result of expression E under the name X, and
with the
attributes renamed to A1, A2, …., An.
Banking Example
branch (branch-name, branch-city, assets)
– customer-name(depositor)
Example Queries
• Find the names of all customers who have a loan at the
Perryridge branch.
Query 1
customer-name(branch-name = “Perryridge”
(borrower.loan-number = loan.loan-number(borrower x loan)))
Query 2
customer-name(loan.loan-number = borrower.loan-number(
(branch-name = “Perryridge”(loan)) x
borrower)
)
Example Queries
Find the largest account balance
• Rename account relation as d
• The query is:
balance(account) - account.balance
(account.balance < d.balance (account x d (account)))
Formal Definition
• A basic expression in the relational algebra consists of
either one of the following:
• A relation in the database
• A constant relation
• Let E1 and E2 be relational-algebra expressions; the
following are all relational-algebra expressions:
• E1 E2
• E1 - E2
• E1 x E2
• p (E1), P is a predicate on attributes in E1
• s(E1), S is a list consisting of some of the attributes in E1
• x (E1), x is the new name for the result of E1
Additional Operations
We define additional operations that do not add any
power to the
relational algebra, but that simplify common queries.
• Set intersection
• Natural join
• Division
• Assignment
Set-Intersection Operation
• Notation: r s
• Defined as:
• r s ={ t | t r and t s }
• Assume:
• r, s have the same arity
• attributes of r and s are compatible
• Note: r s = r - (r - s)
Set-Intersection Operation -
Example
A B A B
1 2
• Relation r, s: 2 3
1
r s
A B
2
• rs
Natural-Join Operation
• Notation: r s
• Let r and s be relations on schemas R and S respectively.The result is a relation on
schema R S which is obtained by considering each pair of tuples tr from r and ts
from s.
• If tr and ts have the same value on each of the attributes in R S, a tuple t is
added to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
• Result schema = (A, B, C, D, E)
• r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
Natural Join Operation –
• Example
Relations r, s:
A B C D B D E
1 a 1 a
2 a 3 a
4 b 1 a
1 a 2 b
2 b 3 b
r s
r s A B C D E
1 a
1 a
1 a
1 a
2 b
Division Operation
rs
• Suited to queries that include the phrase “for
all”.
• Let r and s be relations on schemas R and S
respectively where
• R = (A1, …, Am, B1, …, Bn)
• S = (B1, …, Bn)
The result of r s is a relation on schema
R – S = (A1, …, Am)
r s = { t | t R-S(r) u s ( tu r ) }
Division Operation –
Example A B
Relations r, s:
B
1
1
2
3 2
1 s
1
1
3
4
6
1
2
r s: A r
Another Division Example
Relations r, s:
A B C D E D E
a a 1 a 1
a a 1 b 1
a b 1 s
a a 1
a b 3
a a 1
a b 1
a b 1
r
r s: A B C
a
a
Division Operation (Cont.)
• Property
• Let q – r s
• Then q is the largest relation satisfying q x s r
• Definition in terms of the basic algebra operation
Let r(R) and s(S) be relations, and let S R
To see why
• R-S,S(r) simply reorders attributes of r
CN(BN=“Uptown”(depositor account))
• Generalized Projection
• Outer Join
• Aggregate Functions
Generalized Projection
• Extends the projection operation by allowing
arithmetic functions to be used in the projection list.
(E)
F1, F2, …, Fn
7
7
3
10
sum-C
g sum(c) (r)
27
Aggregate Operation –
Example
• Relation account grouped by branch-name:
branch-name account-number balance
Perryridge A-102 400
Perryridge A-201 900
Brighton A-217 750
Brighton A-215 750
Redwood A-222 700
branch-name balance
Perryridge 1300
Brighton 1500
Redwood 700
Aggregate Functions
(Cont.)
• Result of aggregation does not have a name
• Can use rename operation to give it a name
• For convenience, we permit renaming as part of
aggregate operation
Relation borrower
customer-name loan-number
Jones L-170
Smith L-230
Hayes L-155
Outer
•
Join – Example
Inner Join
loan Borrower
loan
Right Outer Join
borrower
{t | P (t) }
• It is the set of all tuples t such that predicate P is true
for t
• t is a tuple variable, t[A] denotes the value of tuple t
on attribute A
• t r denotes that tuple t is in relation r
• P is a formula similar to that of the predicate calculus
Predicate Calculus Formula
1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , , ,
, )
3. Set of connectives: and (), or (v)‚ not ()
4. Implication (): x y, if x if true, then y is true
x y x v y
5. Set of quantifiers:
t r (Q(t)) ”there exists” a tuple in t in relation r
such that predicate Q(t) is true
t r (Q(t)) Q is true “for all” tuples t in relation r
Banking Example
• branch (branch-name, branch-city, assets)
• customer (customer-name, customer-street,
customer-city)
• account (account-number, branch-name, balance)
• loan (loan-number, branch-name, amount)
• depositor (customer-name, account-number)
• borrower (customer-name, loan-number)
Example Queries
• Find the loan-number, branch-name, and amount
for loans of over $1200
{t | t loan t [amount] 1200}
{t | s borrower(t[customer-name] = s[customer-name])
u depositor(t[customer-name] = u[customer-name])
Example Queries
• Find the names of all customers having a loan at the
Perryridge branch
{t | s borrower(t[customer-name] = s[customer-name]
u loan(u[branch-name] = “Perryridge”
u[loan-number] = s[loan-number]))}
{t | s loan(s[branch-name] = “Perryridge”
u borrower (u[loan-number] = s[loan-
number]
t [customer-name] = u[customer-name])
v customer (u[customer-name] =
v[customer-name]
t[customer-city] =
v[customer-city])))}
Example Queries
• Find the names of all customers who have an account
at all branches located in Brooklyn: