0% found this document useful (0 votes)
37 views

DBMS - Unit 2A

Uploaded by

karanrainavaar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

DBMS - Unit 2A

Uploaded by

karanrainavaar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Unit-02

Relational Query Languages


Relational Algebra is a fundamental concept in the field of databases, especially in
DBMS (Database Management Systems). It is a procedural query language that
works on the relational model, representing queries as expressions that describe how to
retrieve data from a database.
Foundation for Query Languages: Relational algebra forms the theoretical basis for
SQL, the most widely used query language in databases.
Optimization: By breaking queries into simple relational algebra operations, DBMSs
can optimize query execution for better performance.
Declarative Nature: While SQL focuses on "what" to retrieve, relational algebra
focuses on "how" to retrieve, helping in understanding query processing.

Types of Relational operation


1. Select Operation:
● The select operation selects tuples that satisfy a given predicate.
● It is denoted by sigma (σ).
1. Notation: σ p(r)

Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and
NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300


Input:

1. σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Project Operation:

● This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
● It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)

Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:

1. ∏ NAME, CITY (CUSTOMER)

Output:
NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

3. Union Operation:

● Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
● It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S

A union operation must hold the following condition:

● R and S must have the attribute of the same number.


● Duplicate tuples are eliminated automatically.

Example:
DEPOSITOR RELATION-

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284
BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

Input:
∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection:
● Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
● It is denoted by intersection ∩.
1. Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Smith

Jones

5. Set Difference:

● Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S.
● It is denoted by intersection minus (-).
1. Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

1. ∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Jackson

Hayes
Willians

Curry

6. Cartesian product

● The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product.
● It is denoted by X.
1. Notation: E X D

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B

DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input: EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales
1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal

7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
ρ(STUDENT1, STUDENT)

RANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus.
Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results. The
relational calculus tells what to do but never explains how to do. Most commercial
relational languages are based on aspects of relational calculus including SQL-QBE
and QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch of symbolic language. A
predicate is a truth-valued function with arguments. On substituting values for the
arguments, the function result in an expression called a proposition. It can be either true
or false. It is a tailored version of a subset of the Predicate Calculus to communicate
with the relational database.

Many of the calculus expressions involves the use of Quantifiers. There are two
types of quantifiers:

● Universal Quantifiers: The universal quantifier denoted by ∀ is read as for all


which means that in a given set of tuples exactly all tuples satisfy a given
condition.
● Existential Quantifiers: The existential quantifier denoted by ∃ is read as for all
which means that in a given set of tuples there is at least one occurrences whose
value satisfy a given condition.

Before using the concept of quantifiers in formulas, we need to know the concept of
Free and Bound Variables.A tuple variable t is bound if it is quantified which means that
if it appears in any occurrences a variable that is not bound is said to be free.
Free and bound variables may be compared with global and local variable of
programming languages.

Types of Relational calculus:

1. Tuple Relational Calculus (TRC)


It is a non-procedural query language which is based on finding a number of tuple
variables also known as range variable for which predicate holds true. It describes the
desired information without giving a specific procedure for obtaining that information.
The tuple relational calculus is specified to select the tuples in a relation. In TRC,
filtering variable uses the tuples of a relation. The result of the relation can have one or
more tuples.
Notation:A Query in the tuple relational calculus is expressed as following notation
{T | P (T)} or {T | Condition (T)}
Where
T is the resulting tuples
P(T) is the condition used to fetch T.
For example:{ T.name | Author(T) AND T.article = 'database' }
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with
'name' from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).
For example:
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output: This query will yield the same result as the previous one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational calculus. In domain relational
calculus, filtering variable uses the domain of attributes. Domain relational calculus uses
the same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓
(not). It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable. The
QBE or Query by example is a query language related to domain relational calculus.
Notation:{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes
P stands for formula built by inner attributes
For example: {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}
Output: This query will yield the article, page, and subject from the relational javatpoint,
where the subject is a database.

SQL

○ SQL stands for Structured Query Language. It is used for storing and managing
data in relational database management system (RDMS). In RDBMS data stored
in the form of the tables.
○ It is a standard language for Relational Database System. It enables a user to
create, read, update and delete relational databases and tables.
○ All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use
SQL as their standard database language.
○ SQL allows users to query the database in a number of ways, using English-like
statements.
○ SQL is mostly used by engineers in software development for data storage.
Nowadays, it is also used by data analyst for following reason:
SQL Statement Rules:
SQL follows the following rules:

○ Structure query language is not case sensitive. Generally, keywords of SQL are
written in uppercase.
○ Every SQL statements should ends with a semicolon.
○ Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
○ Using the SQL statements, you can perform most of the actions in a database.
○ SQL depends on tuple relational calculus and relational algebra.

How does SQL process works:


When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to interpret
the task. In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc. All the non-SQL
queries are handled by the classic query engine, but SQL query engine won't handle
logical files.

SQL Query Execution Order


In the above diagrammatic representation following steps are performed:

○ Parsing: In this process, Query statement is tokenized.


○ Optimizing: In this process, SQL statement optimizes the best algorithm for byte
code.
○ From: In SQL statement, from keyword is used to specify the tables from which
data fetched.
○ Where: Where keyword works like conditional statement in SQL.
○ Join: A Join statement is used to combine data from more than one tables based
on a common field among them.
○ Group by: It is used to group the fields by different records from table(s).
○ Having: Having clause is also works like conditional statement in SQL. It is
mostly used with group by clause to filter the records.
○ Order by: This clause is used to sort the data in particular order by using "ASC"
for ascending and "DESC" for descending order.
○ Select: This "Data Manipulation Language" statement is used to get the data
from the database.
○ Limit: It is used to specify the how many rows returned by the SQL select
statement.

What is Open Source Database?

An open-source database is a database where anyone can easily view the source code
and this is open and free to download. Also for the community version, some small
additional and affordable costs are imposed. Open Source Database provides Limited
technical support to end-users. Here Installation and updates are administered by the
user.
Advantages of Open Source Databases

● Cost: Open-source databases are generally free, which means they can be used
without any licensing fees.
● Customization: Since the source code is available, developers can modify and
customize the database to meet specific requirements.
● Community Support: Open-source databases have a large community of users
who contribute to documentation, bug fixes, and improvements.
● Security: With open-source databases, security vulnerabilities can be detected
and fixed quickly by the community.
● Scalability: Open-source databases are typically designed to be scalable, which
means they can handle large amounts of data and traffic.
Disadvantages of Open Source Databases

● Limited Technical Support: While there is a large community of users who can
help troubleshoot issues, there is no guarantee of professional technical support.
● Complexity: Open source databases can be more difficult to set up and
configure than commercial databases, especially for users who are not
experienced in database administration.
● Lack of Features: Open source databases may not have all the features that are
available in commercial databases, such as advanced analytics and reporting
tools.

What is Commercial Database?


Commercial databases are those that have been created for Commercial Purposes
only. They are premium and are not free like Open Source Database. In Commercial
Database it is guaranteed that technical support is provided. In this Installation, updates
are Administered by the Software Vendor. For example: Oracle, IBM DB2, etc.
Advantages of Commercial Databases

● Technical Support: Commercial databases usually come with professional


technical support, which can be helpful for organizations that need assistance
with setup, configuration, or troubleshooting.
● Features: Commercial databases typically have more features than open-source
databases, including advanced analytics, reporting, and data visualization tools.
● Security: Commercial databases often have built-in security features and can
provide better protection against cyber threats.
● Integration: Commercial databases are often designed to work seamlessly with
other enterprise software, making integration with existing systems easier.

Disadvantages of Commercial Databases

● Cost: Commercial databases can be expensive, with licensing fees and


maintenance costs that can add up over time.
● Vendor Lock-In: Organizations that use commercial databases may become
dependent on the vendor and find it difficult to switch to another database.
● Limited Customization: Commercial databases may not be as customizable as
open source databases, which can be a disadvantage for organizations with
specific requirements.

Similarities between Open Source Database and Commercial Database

● Both can handle large amounts of data and support complex data structures.
● Both can be used to store and retrieve data in a structured manner.
● Both can be used to support mission-critical applications and services.
● Both use SQL (Structured Query Language) to perform queries and manipulate
data.
● Both can be accessed and managed remotely using a variety of tools and
interfaces.
● Both can be optimized for performance, scalability, and security.

Difference Between Open Source Database and Commercial Database

Basis of Comparison Open Source Database Commercial Database

Focus In Open Source Database Commercial Databases are


anyone can easily view the those that have been
Source code of it. created for Commercial
purposes only.

Examples Examples: MYSQL, Examples: Oracle, DB2,


PostgreSQL, MongoDB, Splunk, etc.
etc.

Cost They are free or have They are premium and are
additional and affordable not free like open source
costs. databases.

Community The community can see, The community cannot


share, and modify the code see, exchange, or modify
of open-source DBMS the code of commercial
software. DBMS software.

Source Code Because the source code The code is not accessible
is open, there is a risk of to unauthorized users and
coding malfunction. has a high level of
protection.

Technical support It provides limited technical It provides guaranteed


support. technical support.

License In this software is available In this Software is available


under free licensing. under high licensing cost.
Support In this User’s needs to rely In this user’s get dedicated
on Community Support. support from Vendor’s from
where one’s buy.

Armstrong Axioms

The term Armstrong Axioms refers to the sound and complete set of inference rules or axioms,
introduced by William W. Armstrong, that is used to test the logical implication of functional
dependencies. If F is a set of functional dependencies then the closure of F, denoted as F+, is
the set of all functional dependencies logically implied by F. Armstrong’s Axioms are a set of
rules, that when applied repeatedly, generates a closure of functional dependencies.
Axioms

● Axiom of Reflexivity: If A is a set of attributes and B is a subset of A, then A holds B. If


B⊆A then A→B. This property is trivial property.
● Axiom of Augmentation: If A→B holds and Y is the attribute set, then AY→BY also
holds. That is adding attributes to dependencies, does not change the basic
dependencies. If A→B, then AC→BC for any C.
● Axiom of Transitivity: Same as the transitive rule in algebra, if A→B holds and B→C
holds, then A→C also holds. A→B is called A functionally which determines B. If X→Y
and Y→Z, then X→Z.

Secondary Rules
These rules can be derived from the above axioms.

● Union: If A→B holds and A→C holds, then A→BC holds. If X→Y and X→Z then X→YZ.
● Composition: If A→B and X→Y hold, then AX→BY holds.
● Decomposition: If A→BC holds then A→B and A→C hold. If X→YZ then X→Y and
X→Z.
● Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds. If X→Y and
YZ→W then XZ→W.
● Self Determination: It is similar to the Axiom of Reflexivity, i.e. A→A for any A.
● Extensivity: Extensivity is a case of augmentation. If AC→A, and A→B, then AC→B.
Similarly, AC→ABC and ABC→BC. This leads to AC→BC.

Armstrong Relation
Armstrong Relation can be stated as a relation that is able to satisfy all functional
dependencies in the F+ Closure. In the given set of dependencies, the size of the
minimum Armstrong Relation is an exponential function of the number of attributes
present in the dependency under consideration.
Advantages of Using Armstrong’s Axioms in Functional Dependency

● They provide a systematic and efficient method for inferring additional functional
dependencies from a given set of functional dependencies, which can help to optimize
database design.
● They can be used to identify redundant functional dependencies, which can help to
eliminate unnecessary data and improve database performance.
● They can be used to verify whether a set of functional dependencies is a minimal cover,
which is a set of dependencies that cannot be further reduced without losing information.

Disadvantages of Using Armstrong’s Axioms in Functional Dependency

● The process of using Armstrong’s axioms to infer additional functional dependencies can
be computationally expensive, especially for large databases with many tables and
relationships.
● The axioms do not take into account the semantic meaning of data, and may not always
accurately reflect the relationships between data elements.

What is Decomposition in DBMS?

Decomposition means dividing a relation R into {R1, R2,......Rn}. It is dependency preserving


and lossless.

Dependency preserving decomposition

Let R is decomposed into {R1, R2,....,Rn} with projected FD set {F1,F2,......Fn}. This
decomposition is dependency preserving if F+ ={F1 U F2 U.........Fn}+.
Example : Let the relation R{A,B,C,D,E} F:{AB->C, C->D, AB->D} R is decomposed to
R1(A,B,C), R2(D,E). Prove decomposition is dependency preserving.
Solution

F1={AB->C}

F2={C->D}
=> (F1 u F2) = {AB->C, C->D}

AB+ under (F1 U F2) = {A,B,C,D} => AB->D is under (F1 U F2)

F+ = (F1 U F2)+

=> Decomposition is dependency preserving.

Decomposition is not preserving

Now consider another example where decomposition is not preserved. Let the relation
R{A,B,C,D,E,F,G,H,I,J} where F: {AB->C, A->DE, B->F, F->GH. D->IJ}

R is decomposed to R1(A,B,C,D), R2(D,E), R3(B,F), R4(F,G,H) AND R5(D,I,J). Check


decomposition is dependency preserving or not.
Solution

F1={AB->C}

F2={}

F3={B->F}

F4={F->GH}

F5={D->IJ}

=> (F1 U F2 U F3 U F4 U F5) = {AB->C, B->F, F->GH, D->IJ}

A+ under (F1 U F2 U F3 U F4 U F5) = {AB->C, B->F, F->GH, D->IJ}

=>A->DE is not under (F1 U F2 UF3 U F4 U F5)

=>F+ ≠ (F1 U F2 U F3 U F4 U F5)+

=> Decomposition is not dependency preserving.


Lossless-join decomposition
It is a process in which a relation is decomposed into two or more relations. This property
guarantees that the extra or less tuple generation problem does not occur, and no information is
lost from the original relation during the decomposition. It is also known as non-additive join
decomposition.
When the sub relations combine again then the new relation must be the same as the original
relation was before decomposition. Consider a relation R if we decomposed it into sub-parts
relation R1 and relation R2. The decomposition is lossless when it satisfies the following
statement −
● If we union the sub–Relation R1 and R2 then it must contain all the attributes that are
available in the original relation R before decomposition.

Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute.
The common attribute must contain unique data. The common attribute must be a super key of
sub relations either R1 or R2. Here,

R = (A, B, C)

R1 = (A, B)

R2 = (B, C)

The relation R has three attributes A, B, and C. The relation R is decomposed into two relation
R1 and R2. . R1 and R2 both have 2-2 attributes. The common attributes are B.
The Value in Column B must be unique. if it contains a duplicate value then the Lossless-join
decomposition is not possible. Draw a table of Relation R with Raw Data −

R (A, B, C)

A B C

12 25 34

10 36 09

12 42 30

It decomposes into the two sub relations as follows −

R1 (A, B)

A B

12 25

10 36

12 42

R2 (B, C)

B C

25 34
36 09

42 30

We can now check the first condition for Lossless-join decomposition.


The union of sub relation R1 and R2 is the same as relation R.

R1U R2 = R

We get the following result −

A B C

12 25 34

10 36 09

12 42 30

The relation is the same as the original relation R hence, the above decomposition is
Lossless-join decomposition.

Query Processing and optimization

Query Processing includes translations of high-level Queries into low-level expressions


that can be used at the physical level of the file system, query optimization, and actual
execution of the query to get the actual result.
High-level queries are converted into low-level expressions during query processing. It
is a methodical procedure that can be applied at the physical level of the file system,
during query optimization, and when the query is actually executed to obtain the result.
It needs a basic understanding of relational algebra and file organization. It includes the
variety of tasks involved in getting data out of the database. It consists of converting
high-level database language queries into expressions that can be used at the file
system’s physical level.
The process of extracting data from a database is called query processing. It requires
several steps to retrieve the data from the database during query processing. The
actions involved actions are:

1. Parsing and translation


2. Optimization
3. Evaluation

The Block Diagram of Query Processing is as:


A detailed Diagram is drawn as:

It is done in the following steps:

Parsing
During the parse call, the database performs the following checks: Syntax check,
Semantic check, and Shared pool check, after converting the query into relational
algebra because certain activities for data retrieval are included in query processing.
First, high-level database languages like SQL are used to translate the user queries that
have been provided. It is transformed into expressions that can be applied further at the
file system’s physical level. Following this, the queries are actually evaluated along with
a number of query-optimizing transformations. Consequently, a computer system must
convert a query into a language that is readable and understandable by humans before
processing it. Therefore, the best option for humans is SQL or Structured Query
Language.
Parser performs the following checks (refer to the detailed diagram):
Syntax check: concludes SQL syntactic validity.

Example: SELECT * FORM employee


Here, the error of the wrong spelling of FROM is given by this check.
Step-1
Semantic check
Determines whether the statement is meaningful or not. Example: query contains a
table name that does not exist and is checked by this check.
Shared Pool check
Every query possesses a hash code during its execution. So, this check determines the
existence of written hash code in the shared pool if the code exists in the shared pool
then the database will not take additional steps for optimization and execution.

Step-2
Optimization
During the optimization stage, the database must perform a hard parse at least for one
unique DML statement and perform optimization during this parse. This database never
optimizes DDL unless it includes a DML component such as a subquery that requires
optimization. It is a process in which multiple query execution plans for satisfying a
query are examined and the most efficient query plan is satisfied for execution. The
database catalog stores the execution plans and then the optimizer passes the
lowest-cost plan for execution.

Row Source Generation


Row Source Generation is software that receives an optimal execution plan from the
optimizer and produces an iterative execution plan that is usable by the rest of the
database. The iterative plan is the binary program that, when executed by the SQL
engine, produces the result set.
Step-3
Evaluation
Finally runs the query and displays the required result

Explain the evaluation of relational algebra expression


SQL queries are decomposed into query blocks. One query block contains a single
SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause (if
any). Nested queries are split into separate query blocks.
Example
Consider an example given below −
Select lastname, firstname from employee where salary>(select max(salary) from
employee where deptname =CSE ;
C=(select max(salary) from employee where deptname=CSE); // inner block
Select lastname, firstname from employee where salary>c; //outer block
Where C represents the result returned from the inner block.

● The relation algebra for the inner block is Ģmax(salary) (σdname=CSE(employee))


● The relation algebra for the outer blocks is Πlastname, firstname(σsalary>c(employee))

The query optimizer would then choose an execution or evaluation plan for each block.

Evaluation of relational algebra expressions


Materialized evaluation − Evaluate one operation at a time. Evaluate the expression in
a bottom-up manner and stores intermediate results to temporary files.

Store the result of A ⋈ B in a temporary file.


Store the result of C ⋈ D in a temporary file.
Finally, join the results stored in temporary files.
The overall cost=sum of costs of individual operations + cost of writing intermediate
results to disk, cost of writing results to results to temporary files and reading them back
is quite high.
Pipelined evaluation − Evaluate several operations simultaneously. Result of one
operation is passed to the next operation. Evaluate the expression in a bottom-up
manner and don’t store intermediate results to temporary files.
Don’t store the result of A ⋈ B in a temporary file. Instead the result is passed directly for
projection with C and so on

Query Equivalence
Two relational algebra expressions are said to be equivalent if on every legal database
instance (i.e., a relation) the two expressions generate the same relation (i.e., the same
set of tuples). Query equivalence relations are used for tuning a query into an optimized
form.

You might also like