Database Guide
Database Guide
DBMS I
Prepared By
Table of Contents
CHAPTER 1: INTRODUCTION...............................................................................................................................................................1
THEORIES..........................................................................................................................................................................................1
CHAPTER 2: ENTITY-RELATIONSHIP MODEL.................................................................................................................................4
QUESTIONS AND ANSWERS...............................................................................................................................................................4
CHAPTER 3, 4: RELATIONAL MODEL & SQL.................................................................................................................................9
POINTS TO BE REMEMBERED............................................................................................................................................................9
THE TRICK OF WRITING RA EXPRESSIONS FOR COMPLEX QUERIES...............................................................................................9
COMPLETE CONCEPTS PROBLEM....................................................................................................................................................10
GENERAL STRUCTURE OF QUERY STATEMENTS.............................................................................................................................16
THEORIES........................................................................................................................................................................................18
CHAPTER 6: INTEGRITY & SECURITY...........................................................................................................................................29
QUESTIONS AND ANSWERS.............................................................................................................................................................29
CHAPTER 7: RELATIONAL DATABASE DESIGN..............................................................................................................................33
CONCEPTS.......................................................................................................................................................................................33
QUESTIONS AND ANSWERS.............................................................................................................................................................33
CHAPTER 11: STORAGE & FILE STRUCTURE................................................................................................................................40
THEORIES........................................................................................................................................................................................40
CHAPTER 12: INDEXING AND HASHING........................................................................................................................................43
CONCEPTS.......................................................................................................................................................................................43
QUESTIONS AND ANSWERS.............................................................................................................................................................45
CHAPTER 1
INTRODUCTION
Theories
1.1
1.2
1.3
2. Airlines /Railways/Road
Transport
5. Telecommunication
3.
Universities
6. Finance
8. On-line Retailers
9.
Manufacturin
g
11. Internet
1.4
1.5
1.6
1.
2.
3.
4.
5.
1.7
1.8
What are the jobs of a DBA? [In-course 2007; 2007; 2004. Marks: 3]
The functions of a database administrator (DBA) include:
1. Schema definition: The DBA creates the original database schema by
executing a set of data definition statements in the DDL.
2. Storage structure and access method definition: File organization
(sequential, heap, hash, B+ tree), organization of records in a file (fixed
length or variable length), index definition (ordered index, hash index).
3. Schema and physical-organization modification: The DBA carries out
changes to the schema and physical organization to reflect the changing
needs of the organization, or to alter the physical organization to improve
the performance.
4. Granting of authorization for data access: By granting different types
of authorization, the DBA can regulate which parts of the database various
users can access.
5. Specifying integrity constraints: The DBA implements key declaration
(primary key, foreign key), trigger, assertion, business rules of the
organization.
6. Acting as liaison with users.
7. Routine maintenance:
i. Periodically backing up the database, either onto tapes or remote
servers, to prevent loss of data in case of disasters.
ii. Ensuring that enough disk space is available for normal operations and
upgrading disk space as required.
iii. Monitoring jobs running on the database and ensuring better
performance.
1.9
What can be done using DML? What are the classes of DML? [Incourse 1, 2005. Marks: 4]
4
The
The
The
The
Classes of DML:
There are basically two types:
1.10
1.
Procedural DMLs user specifies what data are required and how to
get or compute the data. E.g. Relational Algebra.
2.
1.11
1.12
Instance
2. A relation is an instance of a
schema.
1.13
engine understands.
3. Query evaluation engine: Executes low-level instructions generated by
the DML compiler.
CHAPTER 2
ENTITY-RELATIONSHIP MODEL
Questions and Answers
2.1
2.2
2.3
2.4
2.5
Composite attributes:
Attributes that can be divided into subparts are called composite attributes.
For example, the composite attribute address can be divided into attributes street-number, streetname and apartment-number.
Multivalued attributes:
Attributes that have multiple values for a particular entity are called multivalued attributes.
For example, an employee may have multiple telephone numbers. So, the attribute telephone-no is
a multivalued attribute.
Derived attribute:
If the value of an attribute can be derived from the values of other related attributes or entities,
then that attribute is called a derived attribute.
For example, if an entity set employee has two attributes date-of-birth and age, then the attribute
age is a derived attribute as it can be derived from the attribute date-of-birth.
Explain the difference between a weak entity set and a strong entity
set. [In-course 2, 2007; 2005; 2003. Marks: 2]
A strong entity set has a primary key. All tuples in the set are distinguishable
by that key. A weak entity set has no primary key unless attributes of the strong
entity set on which it depends are included. Tuples in a weak entity set are
partitioned according to their relationship with tuples in a strong entity set.
Tuples within each partition are distinguishable by a discriminator, which is a set
of attributes.
Show with an example the association between a weak entity set and
a strong entity set using E-R diagram. [In-course 2, 2007; 2003. Marks:
1]
2.6
We can convert any weak entity set to a strong entity set by simply adding appropriate
attributes. Why, then, do we have weak entity sets?
We have weak entities for several reasons:
We want to avoid the data duplication and consequent possible inconsistencies caused by
duplicating the key of the strong entity.
Weak entities reflect the logical structure of an entity being dependent on another entity.
Weak entities can be deleted automatically when their strong entity is deleted.
Weak entities can be stored physically with their strong entities.
What is the purpose of constraints in database? [2002. Marks: 2]
The purposes of constraints in database are:
1. To implement data check.
2. To centralize and simplify the database, so to make the development of
database applications easier and more reliable.
2.7
2.8
2.9
1]
2.1
0
2.11
R
One-to-One
R
One-to-Many
R
Many-to-One
R
Many-to-Many
One-to-One:
One-to-Many:
Many-to-One:
Many-to-Many:
One-to-One:
One-to-Many:
Many-to-One:
Many-to-Many:
B and AB
A and AB
2.1
3
A
Discriminator of
Weak Entity Set
Derived Attribute
Multivalued Attribute
Draw the E-R diagram for the following relation schemas: [In-course
2, 2007. Marks: 1.5]
Worker (worker_id, worker_name, hourly_rate, skill_type,
supervisor_id)
Assignment (worker_id, building_id, start_date, num_days)
Building (building_id, address, building_type)
worker_id
supervisor_id
num_days
worker_name
start_date
hourly_rate
skill_type
Worker
2.1
4
building_id
building_type
Assignment
address
Building
Give the E-R diagram for the following database: [In-course 1, 2005;
2003. Marks: 2]
person (driver-id, name, address)
car (license, model, year)
accident (report-no, date, location)
owns (driver-id, license)
participated (driver-id, license, report-no, damage-amount)
2.1
5
10
loan-number,
payment-date,
payment-
We are interested to make a database for Railway Reservation System. (We will limit only for
inter-city train between Dhaka and Sylhet.) Generally, a passenger takes flight of inter-city train
that operates between Dhaka-Chittagong-Dhaka and Dhaka-Sylhet-Dhaka. Each train is
identified by an ID and total seating capacity. Each train is assigned a leg instance (an instance
of a flight on a specific date) for which we will keep number of compartments, number of
available seats and date. Passenger reserves seat of each leg instance. For seat, we will keep seat
ID and type. Each leg instance departs from a terminal and arrives to a terminal. We will keep
departure time and arrival time; and for terminal, we will store its ID, name and city. For each
passenger, we will store name, phone and age.
i.
Develop a complete E-R diagram (including cardinalities). Make reasonable assumptions
during your development phases, if needed and state them clearly.
ii.
Translate the E-R diagram into relations (tables). [2005. Marks: 5 + 3]
id
phone
total_seating_capacity
id
name
Train
Passenger
no_of_available_seats
N
Assigned
1
Reserves
no_of_compartments
date
age
Leg Instance
Has
id
Seat
type
N
Arrival_time
Arrives_At
Departs_From
departure_time
1
Terminal
name
city
id
A database is being constructed to keep track of the teams and games of a football league. A
team has a number of players. For the team, we are interested to store team id, team name,
address, date established, name of manager, and name of coach. For the player, we will store
player id in team, date of birth, date joined, position etc. Each team plays games against other
team in a round robin fashion. For each game, we will store game id, date held, score and
attendance (an attribute to designate whether the participating teams have attended the game).
11
Games are generally taking place at various stadiums of the country. For each stadium, we will
keep its size, name and location.
i.
ii.
player_id
position
date_joined
date_established
date_of_birth
team_name
team_id
manager
coach
Player
Plays_For
address
Team
2
Participates_In
name
location
1
Stadium
Is_Held_In
Game
score
date_held
size
attendance
12
id
CHAPTER 3, 4
RELATIONAL MODEL & SQL
Points to be Remembered
3.1
The order in which tuples or attributes appear in a relation is irrelevant, since a relation is a set of
tuples sorted or unsorted does not mater.
3.2
To represent string values, in RA, double quotes (" ") are used, whereas in SQL, single-quotes (' ')
are used.
3.3
Note the difference in representation of the following operators in SQL and RA:
SQL
RA
>=
<=
<>
and
or
not
3.4
In the projection operation, duplicate rows are eliminated in RA (as RA considers relations as
sets); whereas SQL retains duplicate rows by default (since duplicate elimination is time consuming).
To force the elimination of duplicate, a keyword distinct is inserted after select.
3.5
SQL does not allow the use of distinct with count(*)(however, it can be used with count for a
single attribute, e.g. count(distinct A)). distinct can be used with min and max, but result does
not change.
3.6
If a where clause and having clause appear in the same query, SQL applies the predicate in the
clause first. Tuples satisfying the where predicate are then placed into groups by the group by
clause. SQL then applies the having clause, if it is present, to each groups; it removes the groups that
do not satisfy the having clause predicate. The select clause uses the remaining groups to generate
the tuples of the result relation.
where
3.7
The input to sum and avg must be a collection of numbers, but the other aggregate functions
(count, min and max) can operate on collection of non-numeric data types, such as string, as well.
3.8
3.9
3.10
The use of a null value in arithmetic and comparison operations causes several complications. The
result of any arithmetic expression involving null returns null. So 5 + null returns null.
Any comparison with null (other than is null and is not null) returns unknown. So, 5 < null
or null <> null or null = null returns unknown.
3.11
3.12
All aggregate functions except count(*) ignore tuples with null values on the aggregated
attributes.
If we use an arithmetic expression in the select clause, the resultant attribute does not have a name.
14
Diff.
Leve
l
0
1
3
0
3
0
5
5
1
4
5
5
5
2
1
5
4
5
4
4
3
5
3
1
3
2
4
4
3
5
Queries
1. Find the names of all employees who work for First Bank Corporation.
2. Find the names and cities of residence of all employees who work for First Bank
Corporation.
3. Find the names, street address, and cities of residence of all employees who work for
First Bank Corporation and earn more than Tk. 30000.
4. Find names, street addresses and cities of residence of all employees who work under
manager Sabbir and who joined before January 01, 2009.
5. Find the names of all employees in this database who live in the same city as the
company for which they work.
6. Find the names of all employees who live in the same city and on the same street as do
their managers.
7. Find the names of the employees living in the same city where Rahim is residing.
8. Find the names of all employees in this database who do not work for First Bank
Corporation.
9. Find the names of all employees who earn more than every employee of Small Bank
Corporation.
10. Find the names of all employees who earn more than any employee of Small Bank
Corporation.
11. Assume the companies may be located in several cities. Find all companies located in
every city in which Small Bank Corporation is located.
12. Give all employees of First Bank Corporation a 10 percent salary raise.
13. Give all managers in the database a 10% salary raise.
14. Give all managers in this database a 10 percent salary raise, unless the salary would be
greater than Tk.100,000. In such cases, give only a 3 percent raise.
15. Increase the salary of employees by 10% for the companies those are located in Bogra.
16. Modify the database so that Rahim now lives in Bhola.
17. Delete all tuples in the works relation for employees of Small Bank Corporation.
18. Delete records from works that contain employees living in Rajshahi.
19. Display the average salary of each company except Square Pharma.
20. Find the company with the most employees.
21. Find the company that has the smallest payroll.
22. Find the company with payroll less than Tk. 100000.
23. Find those companies whose employees earn a higher salary, on average, than the
average salary of Small Bank Corporation.
Note: the Imp. Level column in the above table means how much important that query is for the exam
(range: 0 5, where 0 means not important at all and 5 means most important); and the Diff. Level field
means how difficult the problem is (range: 0-5, where 0 means very easy and 5 means very difficult).
15
Sample Data
Data
Table Name
employee
ename
Barkat
Jabbar
Jubayer
Najmun Nahar
Oronno
Rafique
Rahim
Sabbir
Salam
Sharafat
works
ename
Rahim
Barkat
Salam
Rafique
Sharafat
Jabbar
Najmun Nahar
Oronno
Jubayer
Sabbir
company
cname
Anonymous IT
Dream Tech
First Bank Corporation
JONS IT (Pvt.) Limited
Small Bank Corporation
Square Pharma
The ONE Limited
Unique Softs
Unknown Systems
Vegabond Company
manages
ename
Rahim
Barkat
Salam
Rafique
Jabbar
Najmun Nahar
Jubayer
street
x
x
u
y
z
z
w
v
y
w
city
Bogra
Comilla
Faridpur
Sylhet
Dhaka
Rajshahi
Dhaka
Chittagong
Comilla
Dhaka
cname
First Bank Corporation
First Bank Corporation
First Bank Corporation
Small Bank Corporation
First Bank Corporation
Small Bank Corporation
Small Bank Corporation
The ONE Limited
Square Pharma
Vegabond Company
city
Chittagong
Chittagong
Dhaka
Sylhet
Dhaka
Bogra
Dhaka
Dhaka
Rajshahi
Bogra
mname
Sharafat
Sharafat
Sharafat
Oronno
Oronno
Sabbir
Sabbir
16
salary
50000
40000
60000
30000
80000
10000
20000
50000
15000
100000
jdate
2008-01-01
2007-01-01
2009-07-01
2009-06-08
2005-06-01
2009-06-05
2009-06-30
2007-06-01
2008-01-01
2001-01-01
17
RA: ename, street, city ( cname = "First Bank Corporation" salary > 30000 (employee works))
4. Find names, street addresses and cities of residence of all employees who work under manager
Sabbir and who joined before January 01, 2009.
SQL: select ename, street, city
from employee natural join works natural join manages
where mname = 'Sabbir' and jdate < '01-JAN-09';
RA: ename, street, city ( mname = "Sabbir" jdate < "01-jan-09" (employee works manages))
5. Find the names of all employees in this database who live in the same city as the company for
which they work.
SQL: select ename from employee natural join works natural join company;
RA: ename (employee works company)
6. Find the names of all employees who live in the same city and on the same street as do their
managers.
SQL: select employee.ename from employee natural join manages, employee as emp
where mname = emp.ename and employee.street = emp.street and employee.city = emp.city;
RA: employee.ename
( mname = emp.ename employee.street = emp.street employee.city = emp.city (employee manages emp (employee)))
7. Find the names of the employees living in the same city where Rahim is residing.
SQL: select ename from employee where city = (
);
18
);
cname, avg(salary) from works where cname <> 'Square Pharma' group by
cname;
RA:
cname
20
cname, sum(salary) from works group by cname having sum(salary) < 100000;
1 Payroll: The total amount of money paid by a company as salary for all the employers.
21
Choose this
this
statement above
the line or the statement below
the line at a time
[optional]
You can use it or leave it
type
PRIMARY KEY
[ NOT ] NULL
columnname column [ ]
,,
PRIMARY KEY ( columnname )
FOREIGN KEY ( columnname ) REFERENCES tablename (columnname)
CONSTRAINT constraintname
( expression)
);
column_data_types:
1. CHAR (number_of_characters)
2. VARCHAR (maximum_number_of_characters)
3. INTEGER (number_of_digits)
4.
NUMERIC
DECIMAL
FLOAT
DOUBLE PRECISION
example: CHAR(30)
example: VARCHAR(255)
example: INTEGER(10)
(total_number_of_digits_including_decimals, number_of_decimal_digits)
example: DECIMAL(5, 2) [for 999.99]
5. DATE
6. TIME
7. DATETIME
function
c olumnname AS alias
,,,
FROM
tablename [ AS alias ]
tablename [ AS alias ] NATURAL
tablename [ AS alias] , , ,
tablename [ AS alias ] JOIN tablename[ AS alias]
OUTER JOIN tablename [ AS alias]
ON expression
USING
(columnname ,, ,)
tablename [ AS alias ] FULLOUTER JOIN tablename[ AS alias]
[WHERE expression]
[GROUP BY column-name_or_alias_or_function]
[HAVING expression]
ASC
[ORDER BY column-name_or_alias_or_function
]
DESC
numberof rows
[LIMIT
];
start index , numberof rowsstart index
23
UPDATE
tablename [ AS alias ]
tablename [ AS alias ] NATURAL
tablename [ AS alias] , , ,
tablename [ AS alias ] JOIN tablename[ AS alias]
OUTER JOIN tablename [ AS alias]
ON expression
tablename [ AS alias ] FULLOUTER JOIN tablename[ AS alias] USING (columnname ,, ,)
24
Theories
3.1
List two reasons why null values might be introduced into the
database.
Nulls may be introduced into the database because the actual value is either
unknown or does not exist. For example, an employee whose address has
changed and whose new address is not yet known should be retained with a null
address. If employee tuples have a composite attribute dependents, and a
particular employee has no dependents, then that tuples dependents attribute
should be given a null value.
3.2
3.3
List two major problems with processing update operations expressed in terms of views.
Views present significant problems if updates are expressed with them. The difficulty is that a
modification to the database expressed in terms of a view must be translated to a modification to the
actual relations in the logical model of the database.
1. Since the view may not have all the attributes of the underlying tables, insertion of a tuple into
the view will insert tuples into the underlying tables, with those attributes not participating in
the view getting null values. This may not be desirable, especially if the attribute in question is
part of the primary key of the table.
2. If a view is a join of several underlying tables and an insertion results in tuples with nulls in
the join columns, the desired effect of the insertion will not be achieved. In other words, an
update to a view may not be expressible at all as updates to base relations.
3.4
3.5
3.6
Domain
Atomic Domain
Non-Atomic Domain
Tuple Variable
1. Domain:
For each attribute, there is a set of permitted values, which are called domain (D) of that
attribute. For the attribute branch-name, the domain is the set of all branch names.
2. Atomic Domain:
A domain is atomic if elements of the domain are considered to be indivisible parts. Example:
set of integers: 23, 45, 5, 78 etc.
25
3. Non-Atomic Domain:
If elements of a domain can be divided into several parts, the domain is called non-atomic
domain. Example: set of all sets of integers: {23, 12, 4; 5, 65, 4; 34, 23, 98}, employee-id:
HR001, IT005
4. Tuple Variable:
A tuple variable is a variable whose domain is the set of all tuples. For example, t[accountnumber] = A-101, t[branch-name] = Mirpur. Alternatively, t[1], t[2] denote the value of
tuple t on first and second attributes and so on.
3.7
Select (unary)
Project (unary)
Rename (unary)
Cartesian Product (binary)
Union (binary)
Set-Difference (binary)
3.9
3.10
3.11
Let r(R) and s(S) be two relations. Give the relational algebra expression for natural join
() and the outer joins ( , , ) of the said relations. [2005. Marks: 2]
r s = R S ( r.A1 = s.A1 r.A2 = s.A2 . r.An = s.An (r s)), where R S = {A1, A2, , An}
26
The outer-join operations extend the natural-join operation so that tuples from the
participating relations are not lost in the result of the join. Describe how the theta join operation
can be extended so that tuples from the left, right, or both relations are not lost from the result
of a theta join.
r s = (r s) (r R (r s)) {(null, null, , null)}
r s = (r s) (s S (r s)) {(null, null, , null)}
r
3.13
With example, show the difference between Cartesian product () and natural join ().
[2005. Marks: 2]
Let, R1 = (A, B) and R2 = (B, C) be two relation schema.
Again, let r1(R1) = {{a, 1}, {b, 2}} and r2(R2) = {{1, x}, {2, y}}
Then, r1 r2 = {{a, 1, 1, x}, {a, 1, 2, y}, {b, 2, 1, x}, {b, 2, 2, y}}
And r1 r2 = {{a, 1, x}, {b, 2, y}}
That is, the Cartesian product operation results in all the combinations of all the tuples from both
tables, whereas the natural join operation results in only the tuple combinations from both tables
where the values of the common attributes (in this example, the attribute B) are the same.
3.14
3.15
For a given relation schema, works (employee_name, company_name, salary), give a relational
algebra expression using all aggregate functions where the grouping is done on company name.
[2007, Marks: 1]
company_nameG sum(salary), avg(salary), max(salary), min(salary), count(employee_name) (works)
Give the equivalent relational algebra expression of the following SQL form:
select A1, A2, , An from r1, r2, , rn where P [2005. Marks: 1]
A1, A2, , An ( P (r1 r2 rn))
3.16
Write short notes on natural join, theta join and aggregate functions.
Natural Join:
The natural join is a binary operation that allows us to combine certain selection and a Cartesian
product into one operation. It is denoted by the join symbol .
The natural join operation forms:
i) A Cartesian product of two arguments
ii) Performs a selection forcing equality on those attributes that appear in both relation schemas
iii) Removes duplicate attributes
Theta Join:
The theta join operation is an extension to the natural join operation that allows us to combine a
selection and a Cartesian product into a single operation. Consider relations r(R) and s(S); let be
predicate on attributes in the schema R S. The theta join operation is defined as follows:
R S = (r s)
27
Aggregate Functions:
Aggregate functions take a collection of values and return a single value as a result. It is denoted
by calligraphic G, G. For a collection of values {1, 1, 3, 4, 4, 11}:
1. sum returns the sum of the values: 24
2. avg returns the average of the values: 4
3. count returns the number of the elements in the collection: 6
4. min returns the minimum value of the collection: 1
5. max returns the maximum value of the collection: 11
3.17
With example, explain the importance of outer joins. [In-course 2007, Marks: 2]
When joining two or more tables, if we want to keep all the records from one table and want to
know which records from the other tables dont match with them, then outer join can be used to solve
the problem easily.
For example, if we want to know which records in two tables (e.g., x and y) do not match, then we
can write the following query using outer join:
select * from x natural full outer join y
where x.some_attribute is null or y.some_attribute is null;
3.18
Let R = (A, B, C); and let r1 and r2 both be relations on schema R. Give an expression in SQL
that is equivalent to each of the following queries. [2003. Marks: 4]
1.
2.
3.
4.
r1 r2
r1 r2
r1 r2
AB (r1) BC (r2)
1. select
2. select
3. select
4. select
as y;2
3.19
*
*
*
*
from
from
from
from
3.20
What aggregate functions can be used for string type data? [Incourse 1, 2008]
count, min, max
3.21
2 Note: Every derived table must have its own alias. So, the aliases x and y must be put to execute the query
successfully.
28
3.23
Identify the relations among primary key, candidate key and super key. [2003. Marks: 3]
Primary Key Candidate Key Super Key
3.24
3.25
3.26
2]
written-by
author-id
name
address
phone
book
author-id
ISBN
ISBN
title
year
price
warehouse
stocks
code
address
phone
3.27
code
ISBN
number
Draw the schema diagram for the following part of the bank
database: [In-course 1, 2008; In-course 2, 2007. Marks: 1.5]
employee (employee-id, employee-name, street, city)
branch (branch-name, branch-city, assets)
job (title, level)
works-on (employee-id, branch-name, title, salary)
works-on
employee
branch
job
3.28
Give the schema diagram for the following part of database: [2004.
Marks: 2]
person (driver-id, name, address)
car (license, model, year)
accident (report-no, date, location)
owns (driver-id, license)
participated (driver-id, license, report-no, damage-amount)
owns
person
car
participated
accident
4.1
What are the join types and conditions that are permitted in SQL?
[2005. Marks: 2]
30
Join types: inner join, left outer join, right outer join, full outer join.
Join conditions: natural, on <predicate>, using (A1, A2, , An).
4.2
4.3
4.4
4.5
Describe the circumstances in which you would choose to use embedded SQL rather than
SQL alone or only a general-purpose programming language.
Writing queries in SQL is typically much easier than coding the same queries in a general-purpose
programming language. However not all kinds of queries can be written in SQL. Also non-declarative
actions such as printing a report, interacting with a user, or sending the results of a query to a
graphical user interface cannot be done from within SQL. Under circumstances in which we want the
best of both worlds, we can choose embedded SQL or dynamic SQL, rather than using SQL alone or
using only a general-purpose programming language.
Embedded SQL has the advantage of programs being less complicated since it avoids the clutter of
the ODBC or JDBC function calls, but requires a specialized preprocessor.
4.6
4.7
4.9
P0035 to P0056.
4. Find the clients with their names and order numbers whose orders are handled by the
salesman Mr. X.
5. Find the product no and description of non-moving products, i.e., products not being
sold.
1. SQL: select name from client where city = 'Dhaka' or city = 'Khulna';
OR:select name from client where city in ('Dhaka', 'Khulna');
RA:
RA:
product-no, description
( (profit-percent / 100 * cost-price + cost-price) > 2000 (profit-percent / 100 * cost-price + cost-price) <= 5000 (product))
RA:
product-no
RA:
client.name, order-no
( client.client-no = salesorder.client-no salesman.salesman-no = salesorder.salesman-no salesman.name = "Mr. X"
(client salesman salesorder))
RA:
OR, SQL:
RA:
4.10
What are the skill types of workers assigned to building B02 (building-id)?
List the name of the workers assigned to warehouse (building-type) buildings.
Find the no. of workers for each building where more than 5 workers are working for it.
Give 5% hourly wage increment for the workers working for hospital buildings.
RA:
RA:
RA:
where worker-id in (
select worker-id from assignment natural join building
where building-type = 'hospital'
);
RA:
4.11
RA:
2.
borrower))
RA:
RA:
branch-name
RA:
4.12
RA:
35
CHAPTER 6
INTEGRITY & SECURITY
Questions and Answers
6.1
6.2
Define foreign key and dangling tuples. How foreign key defines
acceptability of dangling tuples? [2003. Marks: 4]
Consider a pair of relations r(R) and s(S), and the natural join r s. There may
be a tuple tr in r that does not join with any tuple in s. That is, there is no ts in s
such that tr[R S] = ts[R S]. Such tuples are called dangling tuples.
Foreign key defines acceptability of dangling tuples by permitting the use of
null values. Attributes of foreign keys are allowed to be null, provided that they
have not otherwise been declared to be non-null. If all the columns of a foreign
key are non-null in a given tuple, the usual definition of foreign-key constraints is
used for that tuple. If any of the foreign-key columns is null, the tuple is defined
automatically to satisfy the constraint.
6.3
SQL allows a foreign-key dependency to refer to the same relation, as in the following
example:
create table manager
(employee-name
char(20),
manager-name
char(20),
primary key employee-name,
foreign key (manager-name) references manager on delete cascade)
Here, employee-name is a key to the table manager, meaning that each employee has at most
one manager. The foreign-key clause requires that every manager also be an employee. Explain
exactly what happens when a tuple in the relation manager is deleted.
The tuples of all employees of the manager, at all levels, get deleted as well.
This happens in a series of steps. The initial deletion will trigger deletion of all the tuples
corresponding to direct employees of the manager. These deletions will in turn cause deletions of
second level employee tuples, and so on, till all direct and indirect employee tuples are deleted.
6.4
36
6.5
6.6
6.7
Write an assertion for the bank database to ensure that the assets value for the Perryridge
branch is equal to the sum of all the amounts lent by the Perryridge branch.
See the example of assertion in Question and Answer 6.6.
6.8
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.18
Encrypted data allows authorized users to access data without worrying about other users or
the system administrator gaining any information.
2. Encryption of data may simplify or even strengthen other authorization mechanisms. For
example, distribution of the cryptographic key amongst only trusted users is both, a simple
way to control read access, and an added layer of security above that offered by views.
6.19
Perhaps the most important data items in any database system are the passwords that
control access to the database. Suggest a scheme for the secure storage of passwords. Be sure
that your scheme allows the system to test passwords supplied by users who are attempting to
log into the system.
A scheme for storing passwords would be to encrypt each password, and then use a hash index on
the user-id. The user-id can be used to easily access the encrypted password. The password being used
in a login attempt is then encrypted and compared with the stored encryption of the correct password.
An advantage of this scheme is that passwords are not stored in clear text and the code for
decryption need not even exist.
39
CHAPTER 7
RELATIONAL DATABASE DESIGN
Concepts
7.1
7.2
7.3
7.4
7.5
CB
AC B
40
B
b1
b1
b1
b1
C
c1
c2
c1
c3
7.6
7.7
AB C
AD B
AD C
BC A
B
b1
b2
b2
b2
b3
C
c1
c1
c2
c2
c2
CD A
CD B
D
d1
d2
d2
d3
d4
ABD C
ACD B
BCD A
Where:
customer (customer-name, customer-street, customer-city)
account (account-number, balance)
Let Pk(r) denote the primary key attribute of a relation r.
1. The functional dependencies Pk(account) Pk (customer) and
Pk(customer) Pk(account) indicate a one-to-one relationship, because
any two tuples with the same value for account must have the same value
for customer, and any two tuples agreeing on customer must have the
same value for account.
2. The functional dependency Pk(account) Pk(customer) indicates a
many-to-one relationship since any account value which is repeated will
have the same customer value, but many account values may have the
same customer value.
7.8
7.9
Use the definition of functional dependency to argue that each of Armstrongs axioms
(reflexivity, augmentation, and transitivity) is sound.
The definition of functional dependency is: holds on R if in any legal relation r(R), for all
pairs of tuples t1 and t2 in r such that t1[] = t2[], it is also the case that t1[] = t2[].
Reexivity rule: If is a set of attributes, and , then .
Assume t1 and t2 such that t1[] = t2[]
t1[] = t2[]
[since ]
[Definition of FD]
t1[] = t2[]
t1[] = t2[]
t1[] = t2[]
t1[] = t2[]
[ ]
[ ]
[Definition of ]
[ = ]
[Definition of FD]
[Definition of ]
[Definition of ]
[Definition of FD]
(i)
Again,
(ii)
union commutativity]
[Given]
[Augmentation rule]
[Union of identical sets]
[Given]
[Augmentation rule]
[From (i) and (ii) using transitivity rule and set
[Transitivity rule]
Again,
[Given]
[Reflexivity rule]
[Reflexive rule]
[Transitive rule]
7.11
[Given]
[Augmentation rule and set union commutativity]
[Given]
[Transitivity rule]
B
b1
b1
C
c1
c2
Let, = A, = B, = C.
From the above relation, we see that A B and C B (i.e., and ).
However, it is not the case that A C (i.e., ) since the same A () value is in
two tuples, but the C () value in those tuples disagree.
7.12
7.13
What are uses of closure of attribute sets, +? [In-course 2, 2007; In-course 2, 2008; 2007;
2004; Marks: 2]
1. To test if is a superkey, we compute + and check if + contains all attributes of R. If it
contains, is a superkey of R.
2. For a given set of F, we can check if a functional dependency holds (or is in F+), by
checking if +. That is, we compute + by using attribute closure and then check if it
contains .
3. 3. It gives us an alternate way to compute F+: For each R, we find the closure +, and for
each S +, we output a functional dependency S.
7.14
7.15
Compute the closure of the attribute/s to list the candidate key/s for
relation schema R = (A, B, C, D, E) with functional dependencies F = {A
BC, CD E, B D, E A} [2005. Marks: 2]
Define canonical cover, FC. [2004. Marks: 2]
A canonical cover FC for F is a set of dependencies such that F logically implies
all dependencies in FC and FC logically implies all dependencies in F. Furthermore,
FC must have the following properties:
1. No functional dependency in FC contains an extraneous attribute.
2. Each left side of a functional dependency in FC is unique. That is, there are
no two dependencies 1 1 and 2 2 in FC such that 1 = 2.
7.16
The advantage of using canonical cover is that the effort spent in checking for
dependency violations can be minimized.
What are the design goals for relational database design? [In-course
7.17
2, 2007; 2005 Marks: 1]
Explain why each is desirable.
The design goals for relational database design are:
1. Lossless-join decompositions
2. Dependency preserving decompositions
3. Minimization of repetition of information
They are desirable so we can maintain an accurate database, check
correctness of updates quickly, and use the smallest amount of space possible.
7.18
values of one attribute are determined by the values of another attribute in the
same relation, and both values are repeated throughout the relation. This is a
bad relational database design because it increases the storage required for the
relation and it makes updating the relation more difficult.
Inability to represent information is a condition where a relationship exists
among only a proper subset of the attributes in a relation. This is bad relational
database design because all the unrelated attributes must be filled with null
values otherwise a tuple without the unrelated information cannot be inserted
into the relation.
Loss of information is a condition of a relational database which results from
the decomposition of one relation into two relations and which cannot be
combined to recreate the original relation. It is a bad relational database design
because certain queries cannot be answered using the reconstructed relation
that could have been answered using the original relation.
7.19
7.20
7.21
R1 R2 R1.
Suppose that we decompose the schema R = (A, B, C, D, E) into (A, B, C) and (C, D, E).
Show that this decomposition is not lossless-join decomposition if the following set F of
functional dependencies holds: [2006. Marks: 2]
A BC
CD E
BD
EA
Let, r be a relation as follows:
A
E
44
a1
a2
b1
b2
c1
c1
d1
d2
e1
e2
C
c1
c1
C
D
E
c1 d1 e1
c1 d2 e2
R1 (r) R2 (r) would be:
A
a1
a1
a2
a2
B
b1
b1
b2
b2
C
c1
c1
c1
c1
D
d1
d2
d1
d2
E
e1
e2
e1
e2
Deduce the condition for dependency preservation using restrictions for decomposing a
given schema R and a set of FDs F. Decompose the schema R = (A, B, C, D, E) with functional
dependencies F = { A B, BC D } into BCNF with dependency preservation. [2005. Marks: 2
+ 1]
Let F be a set of functional dependencies on schema R. Let R1, R2, ..., Rn be a decomposition of R.
The restriction of F to Ri is the set of all functional dependencies in F+ that include only attributes
of Ri.
The set of restrictions F1, F2, ..., Fn is the set of dependencies that can be checked efficiently.
Let F' = F1 F2 ... Fn.
F' is a set of functional dependencies on schema R, but in general, F' F
However, it may be that F' + = F+.
If this is so, then every functional dependency in F is implied by F', and if F' is satisfied, then F
must also be satisfied.
Therefore, the condition for dependency preservation using restrictions for decomposing a given
schema R and a set of FDs F is that F' + = F+.
Decomposition of the schema R:
We change the order of the FDs in F such that F = {BC D, A B}.
Now, the FD BC D holds on R, but BC is not a superkey. So, we decompose R into
R1 = (B, C, D) and R2 = (A, B, C, E)
R1 is in BCNF. However, the FD A B holds on R2, but A is not a superkey. So, we decompose
R2 into
R3 = (A, B) and R4 = (A, C, E)
Now, R3 and R4 both are in BCNF. [R4 is in BCNF as only trivial functional dependencies exist in
R4]
45
So, the final decomposed relations are: R1 = (B, C, D), R3 = (A, B) and R4 = (A, C, E).
7.23
7.24
What are the differences between BCNF and 3NF? [In-course 2, 2008;
2002, 2004, 2007. Marks: 3]
For a functional dependency , 3NF allows this dependency in a relation if
each attribute A in is contained in any candidate key for R. However, BCNF
does not allow this condition.
It is always possible to find a dependency-preserving lossless-join
decomposition that is in 3NF. However, it is not always possible to find such
decomposition that is in BCNF.
Repetition of information occurs in 3NF, whereas no repetition of information
occurs in BCNF.
7.25
7.26
Address
North Road
Oak Street
North Road
Oak Street
Car
Toyota
Honda
Honda
Toyota
F = {loan-no amount}
If we assume that each customer might have more than one addresses, then
the functional dependency customer-name customer-address cannot be
enforced. Thus, R is in BCNF. However, it is not in 4NF as it contains multivalues
for customer-address and therefore there occurs repetition of information in the
loan-no and amount fields.
7.28
An employee database is to hold information about employees, the department they are in
and the skills which they hold. The attributes to be stored are:
(emp-id, emp-name, emp-phone, dept-name, dept-phone, dept-mgrid, skill-id, skill-name, skilldate, skill-level)
An employee may have many skills such as word-processing, typing, librarian... The date on
which the skill was last tested and the level displayed at that test are recorded for the purposes
of assigning work and determining salary. An employee is attached to one department and each
department has a unique manager.
i.
ii.
Derive a functional dependency set for the above database, stating clearly any
assumptions that you make.
Derive a set of BCNF relations, indicating the primary key of each relation.
[2002. Marks: 4 + 4]
47
CHAPTER 11
STORAGE & FILE STRUCTURE
Theories
11.
1
11.
2
11.
3
11.
4
11.
5
11.
6
11.
7
What are the factors for choosing a RAID level? [2003. Marks: 2]
1.
2.
3.
4.
11.
9
What are possible ways of organizing the records in files? What does
reorganization do? [2004. Marks: 4 + 1]
OR, Classify file organization. Why reorganization is required in
sequential file organization? [2003. Marks: 2 + 1]
OR, What are different types of organization of records in files? What
do you understand by reorganization? [2006. Marks: 3 + 1]
File organization:
1. Heap file organization: Any record can be placed anywhere in the file
where there is space for the record. There is no ordering of records.
Typically, there is a single file for each relation.
2. Sequential file organization: Records are stored in sequential order,
based on the value of a search key of each record.
3. Hashing file organization: A hash function is computed on some
attribute of each record. The result of the function specifies in which block
of the file the record should be placed.
4. Multitable Clustering file organization: Records of several different
relations can be stored in the same file. Related records of the different
relations are stored on the same block so that one I/O operation fetches
related records from all the relations.
Reorganization:
The sequential file organization will work well if relatively few records need to
be stored in overflow blocks. Eventually, however, the correspondence between
search-key order and physical order may be totally lost. In such cases sequential
processing will become mush less efficient. At this point, the file should be
reorganized so that it is once again physically in sequential order.
11.
10
11.
12
room
420
320
instruct
or
SMH
MHK
c1
c2
course relation
course-name
Java
Java
Java
OS
OS
OS
student-name
X
Y
Z
X
Y
Z
enrollment relation
50
grade
A
B
C
A
B
C
11.
14
51
CHAPTER 12
INDEXING AND HASHING
Concepts
12. ODBPLECSHI
pemrixlyt-na
1
enrdtoaT/s
ns/eratNhi
eCransmic
uerdicI
edacns
nIbd
nli
dec
ie
cs
e
s
12.
2
re
y
r s
r
g
o
i
i
c
a
u
o
r
n
n
y
g
Search Key
An attribute or set of attributes used to look up records in a file is called a
search key.
Primary / Clustering Index
If the file containing the records is sequentially ordered, a primary index is an
index whose search key also defines the sequential order of the file.
Secondary / Non-Clustering Index
Indices whose search key specifies an order different from the sequential
order of the file are called secondary indices.
Index-Sequential Files
Files that are ordered sequentially on some search key and have a primary
index on that search key are called index-sequential files.
Dense Index
Dense index is the index where an index record appears for every search-key
value in the file.
Sparse Index
Sparce index is the index where an index record appears for only some of the
search-key values in the file.
Multilevel Index
An index with two or more levels is called a multilevel index.
B+ Tree
A B+ tree is a type of index which takes the form of a balanced tree in which
every path from the root of the tree to a leaf of the tree is of the same length.
In a B+ tree, each non-leaf node in the tree has between n / 2 and n children,
where n is fixed for a particular tree. Each leaf has between (n 1) / 2 and n 1
values. The ranges of values in each leaf do not overlap.
B-Tree
A B-tree index is similar to B+ tree index except that search-key values in a B-tree
appear only onece.
Hashing
52
12.
3
node.
Disdvantages of B-Tree
1. Only a small fraction of desired values are found before reaching a leaf
node.
2. Fewer search-keys appear in non-leaf nodes; hence, fan-out is reduced.
Thus, B-trees typically have greater depth than a corresponding B+ tree.
3. Insertion and deletion are more complicated than in B+ trees.
4. Implementation is harder than B+ trees, since leaf and non-leaf nodes are
of different sizes.
Advantages of Hashing
1. Allows to avoid accessing an index structure.
2. Provides a way of constructing indices.
Since indices speed query processing, why might they not be kept
on several search keys?
Reasons for not keeping several search indices include:
12.
3
1. Every index requires additional CPU time and disk I/O overhead during
inserts and deletions.
2. Indices on non-primary keys might have to be changed on updates,
although an index on the primary key might not (this is because updates
typically do not modify the primary key attributes).
3. Each extra index requires additional storage space.
4. For queries which involve conditions on several search keys, efficiency
might not be bad even if only some of the keys have indices on them.
Therefore database performance is improved less by adding indices when
many indices already exist.
What are the differences between a primary index and a secondary
index? [2005, Marks: 2. 2003, Marks: 3]
Primary Index
Secondary Index
54
Clustering indices may allow faster access to data than a nonclustering index affords. When must we create a non-clustering index
despite the advantages of a clustering index? Explain your answer.
[2007. Marks: 2]
If we need to lookup a record using a search-key other than the search-key on
which the file is stored sequentially, then we must create a non-clustering index
to improve the performance of look-up.
12.
5
12.
6
12.
7
12.
8
12.
9
55
Index file after deletion of the record for the account no A-2:
12.
10
Index file on B
12.
11
12.
12
12.
13
Construct a B+ tree for the following set of key values and for the (i)
four (ii) six (iii) eight pointers that will fit in one node: [2006, Marks: 3.
2003, Marks: 2. (each)]
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
56
(i)
(ii)
(iii)
12.
14
For each B+ tree of Question and Answer 12.13, show the form of the
tree after each of the following series of operations:
1.
2.
3.
4.
5.
Insert 9
Insert 10
Insert 8
Delete 23
Delete 19
Struct
ure
(i)
Operati
on
1. Insert
9
2. Insert
10
3. Insert
8
4. Delete
23
5. Delete
19
(ii)
1. Insert
9
57
2. Insert
10
3. Insert
8
4. Delete
23
5. Delete
19
(iii)
1. Insert
9
2. Insert
10
3. Insert
8
4. Delete
23
5. Delete
19
12.
15
58
12
12.
16
10
20
10
35
11
12
23
13
38
23
20
31
22
38
35
41
36
12.
17
12.
18
Leaf nodes: Each leaf has between (n 1) / 2 and n 1 values. The ranges
of values in each leaf do not overlap.
12.
19
12.
20
Why are the leaf nodes of a B+ tree chained together? [2007, Incourse 2, 2005. Marks: 1]
OR, Why are nodes of a B+ tree at the leaf level linked? [2002.
Marks: 2]
The leaf nodes of a B+ tree are chained together to allow for efficient sequential
processing of the file.
12.
21
B+ Tree Structure
12.
22
What are the differences between B-tree and B+ tree? [2002, Marks:
3. In-course 2, 2005, Marks: 4]
B+ Tree
B-Tree
60
12.
23
12.
24
12.
25
12.
27
12.
28
Why is hash structure not the best choice for a search key on which
range queries are likely? [2006. Marks: 1]
A range query cannot be answered efficiently using a hash index; we will
have to read all the buckets. This is because key values in the range do not
occupy consecutive locations in the buckets; they are distributed uniformly and
randomly throughout all the buckets.
12.
29
12.
30
Open Hashing
62
12.
32
63