0% found this document useful (0 votes)
8 views

14-PhysicalAccess

Uploaded by

chamarilk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

14-PhysicalAccess

Uploaded by

chamarilk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Database Management Systems

Physical Access to Data

DB
MG
1
DBMS Architecture

SQL INSTRUCTION

OPTIMIZER

CONCURRENCY CONTROL
MANAGEMENT OF ACCESS
METHODS

BUFFER MANAGER RELIABILITY MANAGEMENT

Index Files
System DATABASE
Catalog
Data Files

2
DB
MG
Physical Access Structures

Data may be stored on disk in different formats


to provide efficient query execution
Different formats are appropriate for different
query needs
Physical access structures describe how data is
stored on disk

3
DB
MG
Access Method Manager

Transforms an access plan generated by the


optimizer into a sequence of physical access
requests to (database) disk pages
It exploits access methods
An access method is a software module
It is specialized for a single physical data structure
It provides primitives for
reading data
writing data

4
DB
MG
Access method

Selects the appropriate blocks of a file to be


loaded in memory
Requests them to the Buffer Manager
Knows the organization of data into a page
can find specific tuples and values inside a page

5
DB
MG
Organization of a disk page

Different for different access methods


Divided in
Space available for data
Space reserved for access method control
information
Space reserved for file system control information

6
DB
MG
Remarks

Tuples may have varying size


Varchar types
Presence of Null values
A single tuple may span several pages
When its size is larger than a single page
e.g., for BLOB or CLOB data types

7
DB
MG
Database Management Systems

Physical Access Structures

DB
MG
8
Physical Access Structures

Physical access structures describe how data is


stored on disk to provide efficient query
execution
SQL select, update, …
In relational systems
Physical data storage
Sequential structures
Hash structures
Indexing to increase access efficiency
Tree structures (B-Tree, B+-Tree)
Unclustered hash index
Bitmap index 9
DB
MG
Sequential Structures

Tuples are stored in a given sequential order


Different types of structures implement different
ordering criteria
Available sequential structures
Heap file (entry sequenced)
Ordered sequential structure

10
DB
MG
Heap file

Tuples are sequenced in insertion order


insert is typically an append at the end of the file
All the space in a block is completely exploited
before starting a new block
Delete or update may cause wasted space
Tuple deletion may leave unused space
Updated tuple may not fit if new values have larger size
Sequential reading/writing is very efficient
Frequently used in relational DBMS
jointly with unclustered (secondary) indices to support
search and sort operations

11
DB
MG
Ordered sequential structures

The order in which tuples are written depends on


the value of a given key, called sort key
A sort key may contain one or more attributes
the sort key may be the primary key
Appropriate for
Sort and group by operations on the sort key
Search operations on the sort key
Join operations on the sort key
when sorting is used for join

12
DB
MG
Ordered sequential structures

Problem
preserving the sort order when inserting new
tuples
it may also hold for update
Solution
Leaving a percentage of free space in each block
during table creation
On insertion, dynamic (re)sorting in main memory of
tuples into a block
Alternative solution
Overflow file containing tuples which do not fit into
the correct block
13
DB
MG
Ordered sequential structures

Typically used with B+-Tree clustered (primary)


indices
the index key is the sort key
Used by the DBMS to store intermediate
operation results

14
DB
MG
Tree structures

Provide “direct” access to data based on the


value of a key field
The key includes one or more attributes
It does not constrain the physical position of
tuples
The most widespread in relational DBMS

15
DB
MG
General characteristics

One root node

16
DB
MG
Tree structure

U1

17
DB
MG
General characteristics

One root node


Many intermediate nodes
Nodes have a large fan-out
Each node has many children

18
DB
MG
Tree structure

U1

19
DB
MG
General characteristics

One root node


Many intermediate nodes
Nodes have a large fan-out
Each node has many children
Leaf nodes provide access to data
Clustered
Unclustered

20
DB
MG
Tree structure

U1

DATA

21
DB
MG
B-Tree and B+-Tree

Two different tree structures for indexing


B-Tree
Data pages are reached only through key values by
visiting the tree
B+-Tree
Provides a link structure allowing sequential access
in the sort order of key values

22
DB
MG
B-Tree structure

U1

DATA

23
DB
MG
B+-Tree structure

U1

DATA

24
DB
MG
B-Tree and B+-Tree

Two different tree structures for indexing


B-Tree
Data pages are reached only through key values by
visiting the tree
B+-Tree
Provides a link structure allowing sequential access
on the sort order of key values
B stands for balanced
Leaves are all at the same distance from the root
Access time is constant, regardless of the searched
value
25
DB
MG
Clustered

The tuple is contained into the leaf node


Constrains the physical position of tuples in a
given leaf node
The position may be modified by splitting the node,
when it is full
Also called key sequenced
Typically used for primary key indexing

26
DB
MG
Clustered B+-Tree index

U1

Data Data Data Data

27
DB
MG
Unclustered

The leaf contains physical pointers to actual data


The position of tuples in a file is totally
unconstrained
Also called indirect
Used for secondary indices

28
DB
MG
Unclustered B+-Tree index

U1

Data

29
DB
MG
Example: Unclustered B+-Tree index
STUDENT (StudentId, Name, Grade)

12 78 Grade > 78
Grade < 12
12 <= Grade <= 78

19 56
12<=Grade < 19 56< Grade <=78

19 <= Grade <= 56

33 44
19 <= Grade < 33 44< Grade <= 56
33 <= Grade <= 44
LEAF
19 22 30 30 33 34 34 34 40 50

(T1) (T2 ) (T3) (T4) (T5) (T6 ) (T10) (T7) (T8) (T9)

T1 T6 T10 T2 T3 T5 T4 T7 T8 T9
19 34 34 22 30 33 30 34 40 50

DB
30
MG DATA FILE FOR STUDENT TABLE
Example: Clustered B+-Tree index
STUDENT (StudentId, Name, Grade)

12 78 Grade > 78
Grade < 12
12 <= Grade <= 78

19 56
12<=Grade < 19 56< Grade <=78

19 <= Grade <= 56

33 44
19 <= Grade < 33 44< Grade <= 56
33 <= Grade <= 44
LEAF

T1 T2 T3 T4 T5 T6 T10 T7 T8 T9
19 22 30 30 33 34 34 34 40 50

DATA FILE FOR STUDENT TABLE

DB
31
MG
Advantages and disadvantages

Advantages
Very efficient for range queries
Appropriate for sequential scan in the order of the
key field
Always for clustered, not guaranteed otherwise
Disadvantages
Insertions may require a split of a leaf
possibly, also of intermediate nodes
computationally intensive
Deletions may require merging uncrowded
nodes and re-balancing
32
DB
MG
Hash structure

It guarantees direct and efficient access to data


based on the value of a key field
The hash key may include one or more attributes
Suppose the hash structure has B blocks
The hash function is applied to the key field value
of a record
It returns a value between 0 and B-1 which defines
the position of the record
Blocks should never be completely filled
To allow new data insertion

33
DB
MG
Example: hash index
STUDENT (StudentId, Name, Grade)

BLOCK 0

TUPLE T1 H(StudentId =50)=1


StudentId = 50 T1 50
BLOCK 1 T4 75

TUPLE T4 H(StudentId =75)=1


StudentId = 75
BLOCK 2

DATA FILE FOR STUDENT TABLE

34
DB
MG
Hash index

Advantages
Very efficient for queries with equality predicate on
the key
No sorting of disk blocks is required
Disadvantages
Inefficient for range queries
Collisions may occur

35
DB
MG
Unclustered hash index

It guarantees direct and efficient access to data


based on the value of a key field
Similar to hash index
Blocks contain pointers to data
Actual data is stored in a separate structure
Position of tuples is not constrained to a block
Different from hash index

36
DB
MG
Example: Unclustered hash index
STUDENT (StudentId, Name, Grade)

BLOCK 0

TUPLE T1 T1 30
GRADE = 30 H(GRADE=30)=1
30 → T1
BLOCK 1 40 → T2

TUPLE T2 T2 40
GRADE = 40 H(GRADE=40)=1
BLOCK 2
DATA FILE FOR
STUDENT TABLE
INDEX BLOCKS

37
DB
MG
Bitmap index

It guarantees direct and efficient access to data


based on the value of a key field
It is based on a bit matrix
The bit matrix references data rows by means of
RIDs (Row IDentifiers)
Actual data is stored in a separate structure
Position of tuples is not constrained

38
DB
MG
Bitmap index

The bit matrix has


One column for each different value of the indexed
attribute
One row for each tuple
Position (i, j) of the matrix is
1 if tuple i takes value j RID Val1 Val2 … Valn
0 otherwise 1 0 0 … 1
2 0 0 … 0
3 0 0 … 1
4 1 0 … 0
5 0 1 … 0

39
DB
MG
Example: Bitmap index
EMPLOYEE (EmployeeId, Name, Job)

Domain of Job attribute = {Engineer, Consultant, Manager, Programmer, Secretary, Accountant}

RID Eng. Cons. Man. Prog. Secr. Acc.


1 0 0 1 0 0 0
2 0 0 0 1 0 0
3 0 0 0 0 1 0
4 0 0 0 1 0 0
5 1 0 0 0 0 0
Prog.
0 T2
1
0
1
0
T4

DATA FILE
FOR EMPLOYEE
TABLE
40
DB
MG
Bitmap index

Advantages
Very efficient for boolean expressions of predicates
Reduced to bit operations on bitmaps
Appropriate for attributes with limited domain
cardinality
Disadvantages
Not used for continuous attributes
Required space grows significantly with domain
cardinality

41
DB
MG

You might also like