0% found this document useful (0 votes)
41 views11 pages

Data Storage: Disks and Memory Hierarchy

chapter 9 notes

Uploaded by

chingcheongkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views11 pages

Data Storage: Disks and Memory Hierarchy

chapter 9 notes

Uploaded by

chingcheongkit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Storing Data: Disks and Files

October 2024

1 Introduction
This chapter initiates a study of the internals of an RDBivIS. It covers disk
space manager, the buffer manager, the implementation-oriented aspects of the
Jiles and access methods layer.

2 Memory Hierarchy
From top to bottom, the order of memory hierarchy is CPU, CACHE, MAIN
MEMORY, MAGNETIC DISK, TAPE. Where cache and main memory serve
as the primary storage and request for data. The data requested come from
secondary storage the magnetic disk and tertiary storage tape. The main dif-
ference is the access to data. Whichever closer to CPU has faster access time.
In the mean time, the slower the storage the cheaper it is in terms of how much
data it stores. Thus there’s a trade off here and usually databases were stored
on tapes and disks. The data will be retrieved from different layers and to the
top into the main memory to be processed by CPU.

More reasons arises beyond the cost trade off, on systems with 32-bit address-
ing, only 232 bytes can be directly referenced in main memory. The number
of data objects can exceed this number. Concurrency issues arises as well and
program execution flow has to be secured.

Primary storage is volatile, thus needing something non-volatile to store all


the information about the computer and database.

Tapes will be a good idea for archival storage. Although major drawback
for this product is it supports only sequential access. Thus not optimal for stor-
ing operational data or any frequency accessed data. But will be suitable for
backing up operational data.

1
2.1 Magnetic Disks
Magnetic disks support direct random access to data. A DBMS provides seam-
less access to data on disk, thus applications do not worry about whether data
is already in the main memory or disk.

Data is stored on disk in units called disk blocks in which it is a contiguous


sequence of bytes and is the unit in which data is written to a disk and read
from a disk. A disk block is a fixed-size unit of storage (typically between 4 KB
and 64 KB, depending on the DBMS configuration and hardware) used by the
DBMS to store records, indexes, and other database information on disk. When
a DBMS reads or writes data, it does so in terms of these blocks, meaning that
even if you only need a small piece of data (e.g., a single record), the system
will read or write an entire block. When a query retrieves a record, the DBMS
first reads the block that contains that record into memory, and then extracts
the specific record from that block.

Platters are the circular, disk-shaped components of a hard drive where data
is physically stored. Each platter is coated with a magnetic material that allows
data to be written and read by the drive’s read/write heads. A typical hard
drive contains multiple platters stacked on top of each other. Each platter is di-
vided into concentric circles called tracks. In double-sided platters, both the top
and bottom surfaces of the platter are used to store data, effectively doubling
the storage capacity of each platter compared to single-sided platters. Each side
of a platter has its own read/write head. So, if a drive has double-sided plat-
ters, there will be two heads (one for the top and one for the bottom) per platter.

A track is essentially a circular path on the surface of the platter where data is
magnetically recorded. Tracks are further subdivided into smaller units called
sectors, which are the smallest addressable units on a disk.

A sector is typically 512 bytes or 4 KB in size, and it is the smallest phys-


ical storage unit on the disk. Each sector on a track holds a fixed amount of
data.

When reading or writing data, the hard drive reads or writes whole sectors
at a time. A disk block (or page) in the context of a DBMS is a logical
unit of data storage that maps onto one or more physical sectors on the disk.
Typically, a disk block can be composed of multiple sectors (for example, 8 sec-
tors of 512 bytes might make up a 4 KB block). The DBMS uses these blocks
as the fundamental unit for managing and retrieving data, but the hard drive
reads and writes the underlying sectors that make up these blocks. The size of
a disk block can be set when the disk is initialized as a multiple of the sector size

Their relationship: Disk blocks are logical groupings of sectors, and sec-
tors reside on tracks. A track is part of a platter, and each platter has multiple

2
tracks and sectors that store data physically.

Cylinder: in a hard disk drive (HDD), all the tracks on different platters
that are vertically aligned form what is known as a ”cylinder.” Each platter in
a hard disk drive is divided into multiple tracks. These are concentric circles
where data is stored magnetically. A hard disk typically has multiple platters
stacked on top of each other. A cylinder is formed by grouping all the tracks
that are directly aligned across multiple platters. In other words, if you imagine
a vertical slice of the hard drive where each platter’s track (at a specific posi-
tion) is stacked, this set of aligned tracks across platters is called a cylinder.
Each track in a cylinder is at the same radius on its respective platter.

Disk heads: The number of read/write heads in a HDD is directly related to


the number of platters and whether those platters are single-sided or double-
sided. These heads are all mounted on a single actuator arm assembly, which
means they move together as a group. When the arm moves to access data
on one platter, all heads are positioned over the corresponding tracks (which
form a cylinder), but only one head is active at a time. Although all heads
are positioned over corresponding tracks, the HDD controller only activates one
read/write head at a time to perform data access. The heads are switched elec-
tronically by the drive controller, so while switching between heads is fast, the
system doesn’t allow simultaneous reading or writing from multiple heads.

A disk controller is a critical hardware component that acts as the inter-


face between a disk drive (such as a hard disk drive, HDD, or solid-state drive,
SSD) and the computer system’s motherboard. It manages the communication
between the computer’s central processing unit (CPU) and the disk drive, en-
suring that data is efficiently read from and written to the disk.
The disk controller manages the flow of data between the CPU and the disk
drive. It handles requests to read or write data, converts those requests into
signals that the disk drive can understand, and controls how data is physically
stored and retrieved from the disk. The disk controller translates high-level
commands from the operating system (like file system operations) into low-level
operations that the disk drive can process. For example, if the CPU requests
to read a specific file, the disk controller translates that request into finding the
correct blocks on the disk and reading the data. Disk controllers may have min-
imum error detection and correction, checksum is a good example and practical
implementation.

Determining the access time to a location on a hard disk involves calculating


three main components: seek time, rotational delay (latency), and trans-
fer time. These components are particularly important in traditional hard disk
drives (HDDs) that use spinning platters and moving read/write heads.

Seek time is the time taken for the read/write head to move to the track
where the data is located on the disk. When data is stored on a disk, the

3
read/write head must first be positioned over the correct track where the data
resides. Seek time is typically the largest component of access time for HDDs
and can vary significantly depending on how far the head has to move. Average
seek time: The average seek time is often quoted in HDD specifications and
refers to the time it takes for the head to move to a random track. It is usually
measured in milliseconds (ms).
Formula:
Total time to move between tracks
Seek Time =
Number of possible track movements
the actual seek time also depends on how far the head must move from its cur-
rent position.

Rotational Delay is the time taken for the disk platter to rotate and position
the correct sector of the track under the read/write head.
Since the disk is spinning at a constant speed (measured in revolutions per
minute, or RPM), the rotational delay depends on how far the required sector
is from the current position of the head when it reaches the track.
The average rotational delay is half the time of a full rotation, assuming
that the head will on average have to wait for half a rotation to find the desired
sector.
Formula:
1
Rotational Delay =
2 × Rotational Speed (in revolutions per second)

Example:
For a disk spinning at 7,200 RPM, the time for a full rotation is
60seconds
Time for one full rotation = = 8.33milliseconds
7200RPM
Therefore the average rotation delay is:
8.33
Average Rotational Delay = = 4.17milliseconds
2
Transfer time: is the time required to actually transfer the data from the
disk to the memory once the correct sector is under the read/write head. The
transfer time depends on the size of the data being transferred and the data
transfer rate (measured in MB/s or GB/s), which is determined by the speed
of the disk and the interface (such as SATA, SCSI, or NVMe).

Summary for time:


Seek Time: Time to move the read/write head to the correct track (largest fac-
tor). Rotational Delay: Time to rotate the disk so the desired sector is under
the head (dependent on disk RPM). Transfer Time: Time to transfer the data
once the correct sector is positioned (dependent on the transfer rate). Together,

4
these components determine the total time it takes to access data on a tradi-
tional hard disk drive. Note that in solid-state drives (SSDs), there are no seek
times or rotational delays, so the access time is much faster, as data can be
accessed electronically.

2.2 Performance Implication


Data must be in memory for the DBMS to operate on it. The unity for data
transfer between disk and main memory is a block; if a single item on a block
is needed, the entire block is transferred. Reading or writing in a disk block is
called an I/O operations. The time to read or write a block varies, depending
on the location of the data:

access time = seek time + rotational delay + transf er time

The time taken for a database operations is affected significantly by how data
is stored on disks. To minimize this time, it is necessary to locate data records
strategically on disk because of the geometry and mechanics of disks. If two
records are frequently used together, we should place them close together. The
’closest’ that two records can be on a disk is to be on the same block. In de-
creasing order of closesness, they could be on the same track, the same cylinder
, or an adjacent cylinder. As the platter spins, other blocks on the track being
read or written rotate under the active head. In current disk designs, all the
data on a track can be read or written in one revolution. After a track is read
or written, another disk head becomes active, and another track in the same
cylinder is read or written. This process continues until all tracks in the current
cylinder are read or written, and then the arm assembly moves (in or out) to
an adjacent cylinder. Thus we have a natural notion of ’closeness’ for blocks,
which we can extend to a notion of next and previous blocks.
Exploiting this notion of next by arranging records so they are read or written
sequentially is very important in reducing the time spent in disk I/Os. Sequen-
tial access minimizes seek time and rotational delay and is much faster than
random access.

3 Redundant Arrays of Independent Disks


Disks are potential bottlenecks for system performance and storage system re-
liability. Since disks contain mechanical elements, they have a much higher
failure rates than electronic parts of a computer system. A disk array is an
arrangement of several disks, organized to increase performance and improve
reliability of the resulting storage system. Performance is increased through
data striping. Reliability is improved through redundancy. The redundant in-
formation is carefully organized so that, in case of a disk failure, it can be used
to reconstruct the contents of the failed disk. Disk arrays that implement a
combination of data striping and redundancy are called the redundant arrays of

5
independent disks, or RAID. Several RAID organizations, referred to as RAID
levels, have been proposed.

3.1 Data Stripping


Data striping distributes data over several disks to give the impression of having
a single large, very fast disk. A disk array gives the user the abstraction of hav-
ing a single, very large disk. If the user issues an I/O request, we first identify
the set of physical disk blocks that store the data requested. The disk blocks
may reside on a single disk in the array or may be distributed over several disks
in the array.

In data striping, the data is segmented into equal-size partitions distributed


over multiple disks. The size of the partition is called the striping unit.

3.2 Redundancy
Reliability of a disk array can be increased by storing redundant information. If
a disk fails, the redundant information is used to reconstruct the data one the
failed disk.

3.3 RAID
Level 0: Nonredundant, while it has the best write performance, it does not
have the best read performance.
Level 1: Mirroring. Write on the disk then write on the mirror disk. We can
distribute reads between the two disks and allow parallel reads of different disk
blocks that conceptually reside on the same disk. We then have smaller access
time.
Level 2: Error-correcting codes. Striping unit is a single bit. Using Hamming
code.
Level 3: Bit-Interleaved Parity. Single check disk with parity information. The
lowest overhead possible.
Level 4: Block-Interleaved parity. Striping unity of a disk block.
Level 5: Block-Interleaved Distributed Parity. Several write requests could be
processed in parallel, read requests have a higher level of parallelism. Overall
best performance.
Level 6: P+Q redundancy.

4 Disk Space Management


The lowest level of software in the DBMS architecture is the disk space manager.
This supports the concept of a page as a unit of data and provides commands
to allocate or deallocate a page and read or write a page. Size of page = size of
a disk block. Thus reading/writing a page is done in one disk I/O.

6
We often allocate a sequence of pages as a contiguous sequence of blocks to hold
data frequently accessed in sequential order. The disk space manager hides
details of the underlying hardware and allows higher levels of the software to
think the data as a collection of pages.

4.1 Keeping Track of Free Blocks


The disk space manager keeps track of which disk blocks are in use, in addition
to keeping track of which pages are on which disk blocks. We keep track of
block usage is to maintain a list of free blocks or a bitmap with one bit for each
disk block, which indicates the availability of a block or not. Bitmap also allows
very fast identification and allocation of contiguous areas on disk.

4.2 Using OS File Systems to manage Disk Space


Database disk manager can be built using OS files. Normally, DBMS vendor
want their database system to be portable to multiple platform, thus database
systems usually do not rely on the OS file system and do their own disk man-
agement.

5 Buffer Manager
In the case of scanning the whole database which consists more pages than
the memory can hold. DBMS must bring pages into main memory as they are
needed. There will be some strategies to control the flow of bringing and exiting
from main memory.

The buffer manager is the software layer responsible for bringing pages from
disk to main memory as needed. The buffer manager manages the available
main memory by partitioning it into a collection of pages, which we collectively
refer to as the buffer pool. The main memory pages in the buffer pool are called
frames.

Aside from the buffer pool, two more variables for each frame will be recorded.
Managing dirty pages and the number of times that the page currently in a
given frame has been requested but not released (number of the current users of
a page pin count). Boolean variable dirty indicates whether the page has been
modified or not.

Initially, pin count for every frame is set to 0, and the dirty bits are turned
off. When a page is requested the buffer manager will do:
1. Check if that page is in the buffer pool or not. If not, bring it in:
(a) Chooses a frame for replacement by using the replacement policy and incre-
ment its pin count
(b) If the dirty bit for that replacement frame is on, writes the page it contains

7
to disk.
(c) Reads the requested page into the replacement frame.
2. Returns the address of the frame containing the requested page to the re-
questor.
Incrementing the pin count is often called pinning the requested page in its
frame.
The locking protocol ensures that each transaction obtains a shared or exclu-
sive lock before requesting a page to read or modify. Two different transactions
cannot hold an exclusive lock on the same page at the same time. The buffer
manager simply assumes that the appropriate lock has been obtained before a
page is requested.

5.1 Buffer Replacement Policies


LRU: least recently used. This can be implemented in the buffer manager using a
queue of pointers to frames with pin count 0. A frame is added to the end of the
queue when it becomes a candidate for replacement. The page for replacement
will be in the head of the queue.
LRU variant, clock replacement.
But these are no good in case of multiple user requests.
FIFO, MRU.

5.2 Buffer Management in DBMS versus OS


A DBMS can often predict reference patterns because most page references are
generated by higher-level operations (sequential scans) with a known pattern of
page accesses. Being able to predict reference patterns enables prefetching of
pages. I/O can typically be done concurrently with CPU computation.

DBMS also forces the ability to explicitly force a page to disk, to ensure that
the copy of the page on disk is updated with the copy in memory. A DBMS
must be able to ensure that certain pages in the buffer pool are written to disk
before certain other pages to implement the WAL protocol for crash recovery.
Virtual memory implementations in operating systems cannot be relied on to
provide such protocol over when pages are written to disk. If the system crashes
in the interim, the effects can be catastrophic for a DBMS.

6 Files of Records
Higher levels of the DBMS code treat a page as effectively being a collection of
records, ignoring the representation and storage details.
Heap files support file scan and search by record id. Indexed files support files
as a collection of logical records.

8
6.1 Implementing Heap Files
Heap file pages’ data is not ordered in any way. Every record in the file has a
unique rid, and every page in a file is of the same size.
Supported operations on a heap file: Create and destroy files, insert a record,
delete a record with a given rid, get a record with a given rid, we must be able
to find the id of the page containing the record, given the id of the record.
Must keep track of the pages in each heap file to support scans, and we must
keep track of pages that contain free space to implement insertion efficiently.
There are two ways of doing so, in both alternatives, pages must hold two point-
ers for file-level bookkeeping in addition to the data.

Linked List of Pages: Using doubly linked list to maintain a heap file struc-
ture. DBMS just remember where the first page is, in which a table of 2-tuple ⟨
heap file name,page Laddr ⟩ format is stored. The first page of the file is called
the header page. One important task is to maintain empty slot whereby a delet-
ing request has made. Two part of this task is considered: how to keep track of
free space within a page and how to keep track pages with some free space.
When a new page is requested, the disk manager will add it to the list of pages
in file. If a page is deleted from the heap file, it is removed from the list and
deallocated by the manager.
One disadvantage: All pages in a file may be on the free list as spaces may not
be fully used when the records have variable length.

Directory of Pages: one alternative to a linked list of pages is to maintain


a directory of pages. In a directory of pages approach, a central directory (or
table) is maintained, which stores the locations of all the pages that belong to
the heap file. The directory acts as an index to quickly find the appropriate
page when performing insert, deletes or updates. Each entry in the directory
corresponds to a page that holds the actual records. This is similar to how an
inode in a file system works: instead of chaining pages together, the directory
provides direct reference to the pages.
Free space can be managed by maintaining a bit per entry, indicating whether
the corresponding page has any free space, or a count per entry, indicating the
amount of free space on the page.

6.2 Indexed Files


To support file as a collection of logical records through value-based search like
SQL:
Find all students in ”CS” department (equality search); find all students with
a gpa > 3 (range search).
This way, search key is needed: fields on which the file is sorted or hashed; need
not uniquely identify records.
Sorted files: records are sorted by search key. Good for quality and range search,
supported by sorted index.

9
Hashed files: records are grouped by hash value of search key. Good for equality
search. Supported by hashed index.

Sorted ̸= stored sequentially on disk.

7 Page Formats
If a table has a clustered index, then records in the table are identified using
the key value for the clustered index. This has the advantage that secondary
indexes need not be reorganized if records are moved across pages.
Think of a page as a collection of slots, each of which contains a record. A
record is identified by using the pair ⟨page id, slot number⟩, this is the rid.

Disk pages are uniquely identified by page id : (b,s,c,d), block b of surface s


of cylinder c of disk d.
Table of ⟨frame#,page id ⟩ maintained for pages in memory.

7.1 Fixed-length Records


In this case, record slots are uniform and can be arranged consecutively within
a page. At any instant, some slots are occupied by records and others are un-
occupied. The main issues are how we keep track of empty slots and how we
locate all records on a page.

First alternative: store records in the first N slots. Whenever a record is deleted,
we move the last record on the page into the vacated slot. With simple offset
calculation we can locate the i−th slot. Although in the case when external
references are needed this will not work, as the slot number is changed.
Second alternative is to have an array marking the availability of the slots in
this page. When a page is deleted, its bit is turned off. Normally, a page usually
contains additional file-level information.

7.2 Variable-Length Record


When a record is inserted, we must allocate just the right amount of space for
it, when a record is deleted, we must move records to fill the hole created by
the deletion, to ensure all the free space on the page is [Link] we need
the ability to move records on a page.
Most flexible way of doing so is by maintaining a directory of slots for each page
with ⟨record offset, record length⟩ format. The first component is a pointer to
the record, it is the offset in bytes from the start of the data area on the page
to the start of the record. Deletion is by setting the record offset to -1. Record
can be moved around on the page as the rid, the page number and slot number
will not change, only the offset will change.
A special pointer will indicate where the remaining free space is. We move

10
records on the page to reclaim the space freed by records deleted earlier. After
reorganization, all records will be in contiguous order, followed by the available
free space.
The only way to remove slots from the slot directory is to remove the last slot
if the record that it points to is deleted. When a record is inserted, the slot
directory should be scanned for an element that currently does not point to any
record, and this slot should be used for the new record. New slot is added to
the slot directory only if all existing slots point to records. If we do not care
about modifying rids we can move slot entries around instead of moving the
actual records.

8 Record Format
In addition to storing individual records, all records of a given record type is
stored in the system catalog, it is the description of the contents of a database
and maintained by the DBMS.

8.1 Fixed-length Records


Information about field types same for all records in a file; stored in system
catalogs. Find i-th field does not require scan of record. Indeed typically smaller
than a page. rid = (page id, slot #). However in first alternative, momving
records for free space management changes rid; may not be acceptable.

8.2 Variable-length Records


Two alternative formats: (supposing the number of fields are fixed). Each field
will be delimited by special symbols. And there will be an array of field offsets
indicating the length of each field.

9 System Catalogs
For each file (relation): name, structure, attributes and types, indexes, integrity
constraints, etc.
For each index: name, structure and search key.
For each view: view name and definition.
Plus statistics, authorization, buffer pool size, page size, etc. Catalogs are
themselves stored as relations.

11

You might also like