DBMS - Unit 2A
DBMS - Unit 2A
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and
NOT. These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
1. σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Project Operation:
● This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
● It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)
Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION
Input:
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
● Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
● It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Example:
DEPOSITOR RELATION-
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
● Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in both R & S.
● It is denoted by intersection ∩.
1. Notation: R ∩ S
Input:
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
● Suppose there are two tuples R and S. The set intersection operation contains all
tuples that are in R but not in S.
● It is denoted by intersection minus (-).
1. Notation: R - S
Input:
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
● The Cartesian product is used to combine each row in one table with each row in
the other table. It is also known as a cross product.
● It is denoted by X.
1. Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
ρ(STUDENT1, STUDENT)
Relational Calculus
There is an alternate way of formulating queries known as Relational Calculus.
Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results. The
relational calculus tells what to do but never explains how to do. Most commercial
relational languages are based on aspects of relational calculus including SQL-QBE
and QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch of symbolic language. A
predicate is a truth-valued function with arguments. On substituting values for the
arguments, the function result in an expression called a proposition. It can be either true
or false. It is a tailored version of a subset of the Predicate Calculus to communicate
with the relational database.
Many of the calculus expressions involves the use of Quantifiers. There are two
types of quantifiers:
Before using the concept of quantifiers in formulas, we need to know the concept of
Free and Bound Variables.A tuple variable t is bound if it is quantified which means that
if it appears in any occurrences a variable that is not bound is said to be free.
Free and bound variables may be compared with global and local variable of
programming languages.
SQL
○ SQL stands for Structured Query Language. It is used for storing and managing
data in relational database management system (RDMS). In RDBMS data stored
in the form of the tables.
○ It is a standard language for Relational Database System. It enables a user to
create, read, update and delete relational databases and tables.
○ All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use
SQL as their standard database language.
○ SQL allows users to query the database in a number of ways, using English-like
statements.
○ SQL is mostly used by engineers in software development for data storage.
Nowadays, it is also used by data analyst for following reason:
SQL Statement Rules:
SQL follows the following rules:
○ Structure query language is not case sensitive. Generally, keywords of SQL are
written in uppercase.
○ Every SQL statements should ends with a semicolon.
○ Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.
○ Using the SQL statements, you can perform most of the actions in a database.
○ SQL depends on tuple relational calculus and relational algebra.
An open-source database is a database where anyone can easily view the source code
and this is open and free to download. Also for the community version, some small
additional and affordable costs are imposed. Open Source Database provides Limited
technical support to end-users. Here Installation and updates are administered by the
user.
Advantages of Open Source Databases
● Cost: Open-source databases are generally free, which means they can be used
without any licensing fees.
● Customization: Since the source code is available, developers can modify and
customize the database to meet specific requirements.
● Community Support: Open-source databases have a large community of users
who contribute to documentation, bug fixes, and improvements.
● Security: With open-source databases, security vulnerabilities can be detected
and fixed quickly by the community.
● Scalability: Open-source databases are typically designed to be scalable, which
means they can handle large amounts of data and traffic.
Disadvantages of Open Source Databases
● Limited Technical Support: While there is a large community of users who can
help troubleshoot issues, there is no guarantee of professional technical support.
● Complexity: Open source databases can be more difficult to set up and
configure than commercial databases, especially for users who are not
experienced in database administration.
● Lack of Features: Open source databases may not have all the features that are
available in commercial databases, such as advanced analytics and reporting
tools.
● Both can handle large amounts of data and support complex data structures.
● Both can be used to store and retrieve data in a structured manner.
● Both can be used to support mission-critical applications and services.
● Both use SQL (Structured Query Language) to perform queries and manipulate
data.
● Both can be accessed and managed remotely using a variety of tools and
interfaces.
● Both can be optimized for performance, scalability, and security.
Cost They are free or have They are premium and are
additional and affordable not free like open source
costs. databases.
Source Code Because the source code The code is not accessible
is open, there is a risk of to unauthorized users and
coding malfunction. has a high level of
protection.
Armstrong Axioms
The term Armstrong Axioms refers to the sound and complete set of inference rules or axioms,
introduced by William W. Armstrong, that is used to test the logical implication of functional
dependencies. If F is a set of functional dependencies then the closure of F, denoted as F+, is
the set of all functional dependencies logically implied by F. Armstrong’s Axioms are a set of
rules, that when applied repeatedly, generates a closure of functional dependencies.
Axioms
Secondary Rules
These rules can be derived from the above axioms.
● Union: If A→B holds and A→C holds, then A→BC holds. If X→Y and X→Z then X→YZ.
● Composition: If A→B and X→Y hold, then AX→BY holds.
● Decomposition: If A→BC holds then A→B and A→C hold. If X→YZ then X→Y and
X→Z.
● Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds. If X→Y and
YZ→W then XZ→W.
● Self Determination: It is similar to the Axiom of Reflexivity, i.e. A→A for any A.
● Extensivity: Extensivity is a case of augmentation. If AC→A, and A→B, then AC→B.
Similarly, AC→ABC and ABC→BC. This leads to AC→BC.
Armstrong Relation
Armstrong Relation can be stated as a relation that is able to satisfy all functional
dependencies in the F+ Closure. In the given set of dependencies, the size of the
minimum Armstrong Relation is an exponential function of the number of attributes
present in the dependency under consideration.
Advantages of Using Armstrong’s Axioms in Functional Dependency
● They provide a systematic and efficient method for inferring additional functional
dependencies from a given set of functional dependencies, which can help to optimize
database design.
● They can be used to identify redundant functional dependencies, which can help to
eliminate unnecessary data and improve database performance.
● They can be used to verify whether a set of functional dependencies is a minimal cover,
which is a set of dependencies that cannot be further reduced without losing information.
● The process of using Armstrong’s axioms to infer additional functional dependencies can
be computationally expensive, especially for large databases with many tables and
relationships.
● The axioms do not take into account the semantic meaning of data, and may not always
accurately reflect the relationships between data elements.
Let R is decomposed into {R1, R2,....,Rn} with projected FD set {F1,F2,......Fn}. This
decomposition is dependency preserving if F+ ={F1 U F2 U.........Fn}+.
Example : Let the relation R{A,B,C,D,E} F:{AB->C, C->D, AB->D} R is decomposed to
R1(A,B,C), R2(D,E). Prove decomposition is dependency preserving.
Solution
F1={AB->C}
F2={C->D}
=> (F1 u F2) = {AB->C, C->D}
AB+ under (F1 U F2) = {A,B,C,D} => AB->D is under (F1 U F2)
F+ = (F1 U F2)+
Now consider another example where decomposition is not preserved. Let the relation
R{A,B,C,D,E,F,G,H,I,J} where F: {AB->C, A->DE, B->F, F->GH. D->IJ}
F1={AB->C}
F2={}
F3={B->F}
F4={F->GH}
F5={D->IJ}
Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute.
The common attribute must contain unique data. The common attribute must be a super key of
sub relations either R1 or R2. Here,
R = (A, B, C)
R1 = (A, B)
R2 = (B, C)
The relation R has three attributes A, B, and C. The relation R is decomposed into two relation
R1 and R2. . R1 and R2 both have 2-2 attributes. The common attributes are B.
The Value in Column B must be unique. if it contains a duplicate value then the Lossless-join
decomposition is not possible. Draw a table of Relation R with Raw Data −
R (A, B, C)
A B C
12 25 34
10 36 09
12 42 30
R1 (A, B)
A B
12 25
10 36
12 42
R2 (B, C)
B C
25 34
36 09
42 30
R1U R2 = R
A B C
12 25 34
10 36 09
12 42 30
The relation is the same as the original relation R hence, the above decomposition is
Lossless-join decomposition.
Parsing
During the parse call, the database performs the following checks: Syntax check,
Semantic check, and Shared pool check, after converting the query into relational
algebra because certain activities for data retrieval are included in query processing.
First, high-level database languages like SQL are used to translate the user queries that
have been provided. It is transformed into expressions that can be applied further at the
file system’s physical level. Following this, the queries are actually evaluated along with
a number of query-optimizing transformations. Consequently, a computer system must
convert a query into a language that is readable and understandable by humans before
processing it. Therefore, the best option for humans is SQL or Structured Query
Language.
Parser performs the following checks (refer to the detailed diagram):
Syntax check: concludes SQL syntactic validity.
Step-2
Optimization
During the optimization stage, the database must perform a hard parse at least for one
unique DML statement and perform optimization during this parse. This database never
optimizes DDL unless it includes a DML component such as a subquery that requires
optimization. It is a process in which multiple query execution plans for satisfying a
query are examined and the most efficient query plan is satisfied for execution. The
database catalog stores the execution plans and then the optimizer passes the
lowest-cost plan for execution.
The query optimizer would then choose an execution or evaluation plan for each block.
Query Equivalence
Two relational algebra expressions are said to be equivalent if on every legal database
instance (i.e., a relation) the two expressions generate the same relation (i.e., the same
set of tuples). Query equivalence relations are used for tuning a query into an optimized
form.