0% found this document useful (0 votes)
34 views32 pages

Dbms (r23) Unit-5 Q &A

The document outlines key concepts in database management, including conflict and view serializability, file organization techniques, B+ tree file organization, indexing types, transaction anomalies, and static hashing. It describes various performance measures for file organization, details the structure and operations of B+ trees, and explains the implications of interleaved transaction execution. Additionally, it covers the definition of transactions and their properties, emphasizing the importance of maintaining data integrity.

Uploaded by

Sanjay Atmakuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views32 pages

Dbms (r23) Unit-5 Q &A

The document outlines key concepts in database management, including conflict and view serializability, file organization techniques, B+ tree file organization, indexing types, transaction anomalies, and static hashing. It describes various performance measures for file organization, details the structure and operations of B+ trees, and explains the implications of interleaved transaction execution. Additionally, it covers the definition of transactions and their properties, emphasizing the importance of maintaining data integrity.

Uploaded by

Sanjay Atmakuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

UNIT - 5

Mark
Q. No 10-Marks Questions KL CO
s

1 Explain conflict serializability and view serializability


10 L2 5

2 Explain the measures that are to be considered for


comparing the performance of various file organization 10 L2 5
techniques.

3 Explain in detail B+ tree file organization. 10 L2 5

4 Write short notes on i) Primary index ii) Clustered index iii)


Secondary index. 10 L2 5

5 Explain various anomalies that arise due to interleaved


execution of transactions with suitable examples. 10 L2 5

6 What is static hashing? What rules are followed for index


selection? 10 L2 5

7 Define transaction and explain properties of transactions. 10 L1 5

8 What is database Recovery? Explain Shadow paging in


detail. 10 L2 5
1. Explain conflict serializability and view serializability

Conflict serializable:

A Schedule is called Conflict serializable if it can be transformed into a serial


schedule by swapping number of conflicting operations. An operations pair
become conflicting if all conditions satisfy.

1. Both belong to separate transactios.


2. They have the same data item.
3. They contain at least one write operation.
View Serializable:

A schedule is called view serializable if it is view equal to a serial schedule


(no overlapping transaction). A conflict schedule is a view serializable but if
the serializability contains blind writes, then the view serializble does not
conflict serializable.
2. Explain the measures that are to be considered for comparing
the performance of various file organization techniques.

What is a File?

A file is named a collection of related information that is recorded on


secondary storage such as magnetic disks, magnetic tapes, and optical
disks.

What is File Organization?

File Organization refers to the logical relationships among various records.


File Structure refers to the format of the label and data blocks and of any
logical control record.

Types of File Organizations

Various methods have been introduced to Organize files. These particular


methods have advantages and disadvantages on the basis of access or
selection. Thus it is all upon the programmer to decide the best-suited file
Organization method according to his requirements.

Some types of File Organizations are:

 Sequential File Organization


 Direct File Organization
 Indexed Sequential File Organization.

Comparing file organization techniques, each method (sequential, indexed,


hash, etc.) offers trade-offs in speed, storage efficiency, and ease of
implementation, making the best choice dependent on the specific
application's needs

Points of Comparison on File organization, indexing, and


performance tuning

1. Granularity:

 File organization is about how data is stored at the file level.


 Indexing is about improving data access at the table or even column
level.
 Performance tuning is a broad set of activities that can encompass
both file organization and indexing among many other techniques.
2. Resource Usage:

 File organization techniques aim to use disk space efficiently.


 Indexing aims to use both disk space and memory for fast data
retrieval.
 Performance tuning aims to optimize all system resources including
CPU, memory, disk, and network bandwidth.

3. Query Efficiency:

 File organization generally impacts how efficiently data can be read or


written to disk.
 Indexing significantly impacts how efficiently queries can retrieve data.
 Performance tuning seeks to optimize both read and write operations
through a variety of methods.

4. Complexity:

 File organization is relatively straightforward.


 Indexing can become complex depending on the types of indexes and
the nature of the queries.
 Performance tuning is usually the most complex as it involves a holistic
understanding of hardware, software, data, and queries.

3. Explain in detail B+ tree file organization

The B+ tree file organization is a very advanced method of an indexed


sequential access mechanism. In File, records are stored in a tree-like
structure. It employs a similar key-index idea, in which the primary key is
utilised to sort the records. The index value is generated for each primary
key and mapped to the record.

Unlike a binary search tree (BST), the B+ tree can contain more than two
children. All records are stored solely at the leaf node in this method. The
leaf nodes are pointed to by intermediate nodes. There are no records in
them.
Root Node

The root node is a B+ tree topmost node. It functions as the doorway for the
searching of records in the tree. The root node contains references to child
nodes which allow traversal of the tree structure.

Internal Node

The internal nodes in a B+ tree that do not store actual data records are
non-leaf nodes. Instead, they have search keys and pointers to child nodes.
Internal nodes allow fast traversal through the tree during search operations.

Leaf Node

Leaf nodes are at the deepest level of a B+ tree. Different from internal
nodes, leaf nodes store data records together with pointers to adjacent leaf
nodes. Every leaf node represents a range of values therefore they are the
maximum and minimum values for search operations.

Search Key

A search key is either unique attribute or one or many attributes combined


to search for specific records within a B+ tree. It is helpful for finding the
preferred data promptly. Characteristically, the search key is either the
primary key or an indexed field in the database.

Construction of B+ Tree

 Begin with the tree empty.

 Insert records sequentially, adjusting the tree as necessary to ensure


balance.

 Internal nodes contain search keys and pointers to child nodes and leaf
nodes keep actual data records.

Searching in B+ Tree
 Start from the root node and compare the search key.

 Traverse the tree using comparison until reaching the required leaf
node.

 Carry out a sequential search within the leaf node to get the required
record.

Insertion in B+ Tree

 Create records at appropriate leaf level.

 If the insertion results in the leaf node to be full, split the node and
update the parent node.

 Ripple up to root when necessary.

Deletion in B+ Tree

 Place and erase the required record from the leaf node.

 If removal results in underflow of a node borrow from a neighbouring


node or merge two nodes.

 Propagate up to the root if needed.

4. Write short notes on i) Primary index ii) Clustered index iii)


Secondary index.

Indexed Sequential file Organisation is an advanced sequential file


organization method records are stored in the file with the help of the
primary key.

What is Primary Indexing?

A Primary Index is an ordered file whose records are of fixed length with two
fields. The first field of the index is the primary key of the data file in an
ordered manner, and the second field of the ordered file contains a block
pointer that points to the data block where a record containing the key is
available.
Working of Primary Indexing

 In primary indexing, the data file is sorted or clustered based on the


primary key as shown in the below figure.

 An index file (also known as the index table) is created alongside the
data file.

 The index file contains pairs of primary key values and pointers to the
corresponding data records.

 Each entry in the index file corresponds to a block or page in the data
file.

Types of Primary Indexing

 Dense Indexing: In Dense Index has an index entry for every search
key value in the data file. This approach ensures efficient data retrieval
but requires more storage space.
No of Index Entry = No of DB Record

 Sparse Indexing: Sparse indexing involves having fewer index entries


than records. The index entries point to blocks of records rather than
individual records. While it reduces storage overhead, it may require
additional disk accesses during retrieval.
No of Index Entry = No of Block

clustered index:

A clustered index determines the physical order of data in a table. In other


words, the order in which the rows are stored on disk is the same as the
order of the index key values. There can be only one clustered index on a
table, but the table can have multiple non-clustered (or secondary) indexes.
Characteristics of a Clustered Index:

1. It dictates the physical storage order of the data in the table.


2. There can only be one clustered index per table.
3. It can be created on columns with non-unique values, but if it's on a
non-unique column, the DBMS will usually add a uniqueifier to make
each entry unique.
4. Lookup based on the clustered index is fast because the desired data
is directly located without the need for additional lookups (unlike non-
clustered indexes which require a second lookup to fetch the data).
Clustered Index Example

Imagine a `Books` table with the following records:

BookID Title Genre


3 A Tale of Two Cities Fiction
1 Database Systems Academic
4 Python Programming| Technical
2 The Great Gatsby Fiction
Fiction |

If we create a clustered index on `BookID`, the physical order of records


would be rearranged based on the ascending order of `BookID`.

The table would then look like this:

BookI Title Genre


D
1 Database Systems Academi
c
2 The Great Gatsby Fiction

3 A Tale of Two Cities Fiction


4 Python Programming| Technica
l

Now, when you want to find a book based on its ID, the DBMS can quickly
locate the data because the data is stored in the order of the BookID.

Secondary Index

Secondary indexing is a database management technique used to create


additional indexes on data stored in a database. The main purpose of
secondary indexing is to improve the performance of queries and to simplify
the search for specific records within a database. A secondary index provides
an alternate means of accessing data in a database, in addition to the
primary index. The primary index is typically created when the database is
created and is used as the primary means of accessing data in the database.
Secondary indexes, on the other hand, can be created and dropped at any
time, allowing for greater flexibility in managing the database.
5. Explain various anomalies that arise due to interleaved
execution of transactions with suitable examples.

Anomalies in the relational model refer to inconsistencies or errors that can


arise when working with relational databases, specifically in the context of
data insertion, deletion, and modification. There are different types of
anomalies that can occur in referencing and referenced relations which can
be discussed as:

These anomalies can be categorized into three types:

 Insertion Anomalies

 Deletion Anomalies

 Update Anomalies.

Interleaved transaction execution, where multiple transactions operate


concurrently, can lead to several anomalies that compromise data integrity,
including lost updates, inconsistent reads, and incorrect summaries, which
require careful concurrency control mechanisms to prevent.
1. Lost Updates:

 Explanation:

When two or more transactions update the same data concurrently,


one update might overwrite the other, resulting in the loss of the first
transaction's changes.

 Example:

Two transactions, T1 and T2, both read the value of a variable 'A' as
10. T1 then updates 'A' to 15, but before T1 commits, T2 also updates
'A' to 20. If T2 commits before T1, T1's update of 'A' to 15 is lost, and
the final value of 'A' will be 20 instead of 15.

2. Inconsistent Reads:

 Explanation:

A transaction might read the same data multiple times during its
execution, and the data might change between reads due to another
transaction, leading to inconsistencies.

 Example:

Transaction T1 reads the value of 'A' as 10. While T1 is still running,


transaction T2 updates 'A' to 20. If T1 reads 'A' again, it might get 20,
even though it initially read 10, leading to an inconsistent view of the
data.

3. Incorrect Summaries (or Aggregate Errors):

 Explanation:

This occurs when a transaction calculates a summary (e.g., a total or


average) based on data that is being updated by another transaction
concurrently, resulting in an incorrect summary.

 Example:
Transaction T1 calculates the total of a column 'Amount'. While T1 is
calculating, transaction T2 updates some values in the 'Amount'
column. T1's total will be incorrect because it is based on a partially
updated dataset

6. What is static hashing? What rules are followed for index


selection?

Hashing is an effective technique to calculate the direct location of a data


record on the disk without using index structure.

Hashing uses hash functions with search keys as parameters to generate the
address of a data record.

Hash Organization

 Bucket − A hash file stores data in bucket format. Bucket is


considered a unit of storage. A bucket typically stores one complete
disk block, which in turn can store one or more records.
 Hash Function − A hash function, h, is a mapping function that maps
all the set of search-keys K to the address where actual records are
placed. It is a function from search keys to bucket addresses.

Static Hashing
In static hashing, when a search-key value is provided, the hash function
always computes the same address. For example, if mod-4 hash function is
used, then it shall generate only 5 values. The output address shall always
be same for that function. The number of buckets provided remains
unchanged at all times.
Operation

 Insertion − When a record is required to be entered using static hash,


the hash function h computes the bucket address for search key K,
where the record will be stored.

Bucket address = h(K)

 Search − When a record needs to be retrieved, the same hash


function can be used to retrieve the address of the bucket where the
data is stored.
 Delete − This is simply a search followed by a deletion operation.

Bucket Overflow

The condition of bucket-overflow is known as collision. This is a fatal state


for any static hash function. In this case, overflow chaining can be used.

 Overflow Chaining − When buckets are full, a new bucket is


allocated for the same hash result and is linked after the previous one.
This mechanism is called Closed Hashing.

 Linear Probing − When a hash function generates an address at


which data is already stored, the next free bucket is allocated to it.
This mechanism is called Open Hashing.
Dynamic Hashing

The problem with static hashing is that it does not expand or shrink
dynamically as the size of the database grows or shrinks. Dynamic hashing
provides a mechanism in which data buckets are added and removed
dynamically and on-demand. Dynamic hashing is also known as extended
hashing.

Hash function, in dynamic hashing, is made to produce a large number of


values and only a few are used initially.

Organization
The prefix of an entire hash value is taken as a hash index. Only a portion of
the hash value is used for computing bucket addresses. Every hash index
has a depth value to signify how many bits are used for computing a hash
function. These bits can address 2n buckets. When all these bits are
consumed − that is, when all the buckets are full − then the depth value is
increased linearly and twice the buckets are allocated.

Operation

 Querying − Look at the depth value of the hash index and use those
bits to compute the bucket address.
 Update − Perform a query as above and update the data.
 Deletion − Perform a query to locate the desired data and delete the
same.
 Insertion − Compute the address of the bucket

7. Define transaction and explain properties of transactions.


A transaction is a sequence of operations performed as a single logical unit
of work. A transaction is considered complete only if all its operations are
successfully executed, Otherwise the transaction must be rolled back,
ensuring the database remains in a consistent state.

Operations of Transaction

Read(X)

A read operation is used to read the value of a particular database element X


and stores it in a temporary buffer in the main memory for further actions
such as displaying that value.

Example:

For a banking system, when a user checks their balance, a Read operation
is performed on their account balance:

SELECT balance FROM accounts WHERE account_id = 'A123';

This updates the balance of the user's account after withdrawal

2) Write(X)

A write operation is used to write the value to the database from the buffer
in the main memory. For a write operation to be performed, first a read
operation is performed to bring its value in buffer, and then some changes
are made to it, e.g. some set of arithmetic operations are performed on it
according to the user's request, then to store the modified value back in the
database, a write operation is performed.

Example:

For the banking system, if a user withdraws money, a Write operation is


performed after the balance is updated:

UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A123';

This updates the balance of the user’s account after withdrawal.

3) Commit

This operation in transactions is used to maintain integrity in the database.


Due to some failure of power, hardware, or software, etc., a transaction
might get interrupted before all its operations are completed. This may
cause ambiguity in the database, i.e. it might get inconsistent before and
after the transaction.

Example:

After a successful money transfer in a banking system, a Commit operation


finalizes the transaction:

COMMIT;

Once the transaction is committed, the changes to the database are


permanent, and the transaction is considered successful.

4) Rollback

If an error occurs, the Rollback operation undoes all the changes made by
the transaction, reverting the database to its last consistent state. In simple
words, it can be said that a rollback operation does undo the operations of
transactions that were performed before its interruption to achieve a safe
state of the database and avoid any kind of ambiguity or inconsistency.

Example:

Suppose during the money transfer process, the system encounters an issue,
like insufficient funds in the sender’s account. In that case, the transaction is
rolled back:

ROLLBACK;
Atomicity: Each Transaction is treated as one unit and either run to
completion or is not executed at all.

Atomicity following the operations:

Abort : If a transaction aborts, changes made to the database are not


visible.
Commit : If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.

Consider the following transaction T consisting of T1 and T2 : Transfer of


100 from account X to account Y .

Example

If the transaction fails after completion of T1 but before completion of T2


( say, after write(X) but before write(Y) ), then the amount has been
deducted from X but not added to Y . This results in an inconsistent database
state. Therefore, the transaction must be executed in its entirety in order to
ensure the correctness of the database state.
Consistency:

Consistency ensures that a database remains in a valid state before and


after a transaction. It guarantees that any transaction will take the database
from one consistent state to another, maintaining the rules and constraints
defined for the data.
Referring to the example above,
The total amount before and after the transaction must be maintained.

Total before T occurs = 500 + 200 = 700 .


Total after T occurs = 400 + 300 = 700 .
Therefore, the database is consistent . Inconsistency occurs in case T1
completes but T2 fails.

Isolation:

Changes occurring in a particular transaction will not be visible to any other


transaction until that particular change in that transaction is written to
memory or has been committed. This property ensures that when multiple
transactions run at the same time, the result will be the same as if they were
run one after another in a specific order.
Let X = 500, Y = 500.
Consider two transactions T and T”.
Durability:

This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to
disk and they persist even if a system failure occurs. These updates now
become permanent and are stored in non-volatile memory. The effects of the
transaction, thus, are never lost

8. What is database Recovery? Explain Shadow paging in detail

Database Backup: A database backup is an exact copy of your database


kept in a separate location.

Database Recovery: Database recovery refers to the processes of


restoring a database is consistent storage after failure or Corruption.

Need of Recovery:

Logical Error, System Error, system crash/computer failure , Disk


failure/media failure , Physical problems and environmental disasters.

Recovery facilities:

1. Backup Mechanism: This makes periodic backup copies of the


database.
2. Logging facilities: this keep track of the current state of the
transactions and database changes.
3. Checkpoint facilities: This enable updates to the database that are in
progress to be made permanently.
4. Recovery Manager: This allows system to restore the database to a
consistent state following a failure.

Recovery Techniques:

1. Backup and Restore


2. Transaction Log
3. Shadow Paging
4. Check pointing

Shadow Paging:

Shadow paging in DBMS is a recovery technique that uses a shadow copy (or
snapshot) of the database to ensure data consistency and enable crash
recovery by maintaining two versions of the database state: the shadow
page table and the current page table

All changes are made to the current page , while the shadow page remains
unchanged until the transaction commits.

If a failure occurs, the shadow page can be used to restore the database to
its last consistent state
Start of Transaction:

 The shadow page table is created by copying the current page table.

 The shadow page table represents the original, unmodified state of the
database.

 This table is saved to disk and remains unchanged throughout the


transaction.

Logical Shadow Page Table Current Page


Page (Disk) Table
P1 Address_1 Address_1
P2 Address_2 Address_2
P3 Address_3 Address_3

Transaction Execution:

 Updates are made to the database by creating new pages.


 The current page table reflects these changes, while the shadow page
table remains unchanged.

Page Modification:

If a logical page (e.g. P2) needs to be updated:

 A new version of the page (P2’) is created in memory and written to a


new physical storage block.

 The current page table entry for P2 is updated to point to P2’.

 The shadow page table still points to the original page P2, ensuring it is
unaffected by the changes.

Logical Shadow Page Table Current Page


Page (Disk) Table
P1 Address_1 Address_1
P2 Address_2 Address_4 (P2′)
P3 Address_3 Address_3

Commit:

 If the transaction is successful, the shadow page table is replaced by


the current page table.

 This replacement makes the changes permanent.

Logical Shadow Page Table Current Page


Page (Disk) Table
P1 Address_1 Address_1
P2 Address_4 (P2′) Address_4 (P2′)
P3 Address_3 Address_3
Abort:

 If the transaction is aborted, the current page table is discarded,


leaving the shadow page table intact.

 Since the shadow page table still points to the original pages, no
changes are reflected in the database.
Q. Mark
2-Marks Questions KL CO
No s

1 Define a Transaction? List the properties of transaction ? 2 L1 5

2 Define Serializability? 2 L2 5

3 Explain about Conflict Serializability. 2 L2 5


Explain aboutARIES Recovery algorithm.
4 2 L1 5

5 Explain about ACID properties. 2 L2 5

6 What is two-phase locking (2PL) ? 2 L2 5


1. Define a Transaction? List the properties of transaction ?
A transaction is a sequence of operations performed as a single logical unit
of work. A transaction is considered complete only if all its operations are
successfully executed, Otherwise the transaction must be rolled back,
ensuring the database remains in a consistent state
Properties are the
1. Atomicity
2. Consistency
3. Isolation
4. Durability
2. Define Serializability?

In database management systems (DBMS), serializability ensures that the


outcome of executing multiple transactions concurrently is the same as if
they were executed sequentially, one after the other, preventing data
inconsistencies.

3. Explain about Conflict Serializability.


A Schedule is called Conflict serializable if it can be transformed into a serial
schedule by swapping number of conflicting operations. An operations pair
become conflicting if all conditions satisfy.

4. Both belong to separate transactios.


5. They have the same data item.
6. They contain at least one write operation.
4. Explain aboutARIES Recovery algorithm.

Algorithm for Recovery and Isolation Exploiting Semantics (ARIES) is based


on the Write Ahead Log (WAL) protocol. Every update operation writes a log
record which is one of the following :

1. Undo-only log record:


Only the before image is logged. Thus, an undo operation can be done
to retrieve the old data.
2. Redo-only log record:
Only the after image is logged. Thus, a redo operation can be
attempted.
3. Undo-redo log record:
Both before images and after images are logged.

5. Explain about ACID properties.

Atomicity: Each Transaction is treated as one unit and either run to


completion or is not executed at all.
Consistency:

Consistency ensures that a database remains in a valid state before and


after a transaction.

Isolation:

Changes occurring in a particular transaction will not be visible to any other


transaction until that particular change in that transaction is written to
memory or has been committed.

Durability:

This property ensures that once the transaction has completed execution,
the updates and modifications to the database are stored in and written to
disk and they persist even if a system failure occurs.

6.What is two-phase locking (2PL)

Two Phase Locking

The Two-Phase Locking (2PL) Protocol is a key technique used in database


management systems to manage how multiple transactions access and
modify data at the same time

The Two-Phase Locking Protocol resolves this issue by defining clear rules for
managing data locks. It divides a transaction into two phases:

1. Growing Phase: In this step, the transaction gathers all the locks it
needs to access the required data. During this phase, it cannot release
any locks.

2. Shrinking Phase: Once a transaction starts releasing locks, it cannot


acquire any new ones. This ensures that no other transaction interferes
with the ongoing process.

You might also like