0% found this document useful (0 votes)
68 views50 pages

L12-Externalsorting Indexfiles PDF

This document discusses data structures and algorithms for external storage. It begins by introducing external storage, including secondary memory organized into blocks and the costs of block accesses and I/O. It then covers several topics for organizing and accessing large datasets stored externally like files, including external sorting algorithms like merge sort and polyphase merge sort. It also discusses indexing files through structures like B-trees to enable faster retrieval of records.

Uploaded by

Andrei Ardelean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views50 pages

L12-Externalsorting Indexfiles PDF

This document discusses data structures and algorithms for external storage. It begins by introducing external storage, including secondary memory organized into blocks and the costs of block accesses and I/O. It then covers several topics for organizing and accessing large datasets stored externally like files, including external sorting algorithms like merge sort and polyphase merge sort. It also discusses indexing files through structures like B-trees to enable faster retrieval of records.

Uploaded by

Andrei Ardelean
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Structures and

Algorithms for External


Storage

External sorting. Index files.

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

External Storage

Secondary memory

Typically organized in blocks


Basic operations involve buffers

Cost measure

Disk: seek time, latency time


Block accesses

Data typically stored in files


Files

Sequential access
Direct access

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Files

All algorithms so far assumed that all


elements of a (large) array can be accessed
randomly.
If the array is too large to fit in main memory,
it has to be kept on a secondary storage
device.
Typically, if data is organized as sequential
files, which guarantee (in average) constant
access time only for strictly sequential read
and write operations.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Storing Information in Files

Typical operations on files:

insert a particular record into a particular file.


delete from a particular file all records having a

designated key value in each of a designated set


of fields.
modify all records in a particular file by setting to
designated values certain fields in those records
that have a designated value in each of another
set of fields.
retrieve all records having designated values in
each of a designated set of fields.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

External Sorting
External sorting: sorting data stored on
secondary memory (typically as files)
Cost measures:

Number of block accesses


(The number of steps required to sort n records)
(The number of comparisons between keys
needed to sort n records (if the comparison is
expensive))
(The number of times the records must be
moved)
Note that the items in paranthesis refer to main
memory
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Merge Sort

Idea: organize file into progressively larger runs


run: sequence of records r1, , rk, where key(r1) key(r2)
key(rk)

length of run
tail
Example

Begin with two files, say f1 and f2, organized into runs of
length k
Assume that:
The numbers of runs, including tails, on f1 and f2 differ by at

most one,
At most one of f1 and f2 has a tail, and
The one with a tail has at least as many runs as the other.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Merge Sort for Files

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Merge Sort for Files

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Mergesort Example

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Speed up Mergesort

Begin with a pass that:

reads k records in memory,


sorts them with (quicksort),
writes them back,
then merge

Use more channels to secondary memory

to make efficient use of processor speed

Carefully select run to replenish if runs are much


larger than block size

Based on the last keys compared

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

10

Speed up Mergesort Example

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

11

Multiway Merge

If reading and writing between main and secondary memory is the


bottleneck, perhaps we could save time if we had more the one data
channel. Suppose that
We have 2m disk units, each with its own channel. We could place
m files, f1, f2,...,fm on m of the disk units, say organized as runs of
length k.
We can read m runs, one from each file, and merge them into one
run of length mk. This run is placed on one of m output files g1,
g2,..., gm, each getting a run in turn.
The merging process in main memory can be carried out in O(log m)
steps per record if we organize candidate records, into a heap
If we have n records, and the length of runs is multiplied by m with
each pass, then after i passes runs will be of length mi.
If mi n, that is, after i = logm n passes, the entire list will be
sorted. As logm n = log2 n / log2 m, we save by a factor of log2 m in
the number of times we read each record

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

12

Polyphase Sort

We can perform an m-way merge sort with only


m+1 files, as an alternative to the 2m-file strategy:

In one pass, when runs from each of m files are merged


into runs of the m+1st file, we need not use all the runs on
each of the m input files. Rather, each file, when it becomes
the output file, is filled with runs of a certain length. It uses
some of these runs to help fill each of the other m files
when it is their turn to be the output file.
Each pass produces files of a different length. Since each of
the files loaded with runs on the previous m passes
contributes to the runs of the current pass, the length on
one pass is the sum of the lengths of the runs produced on
the previous m passes. ( If fewer than m passes have taken
place, regard passes prior to the first as having produced
runs of length 1.)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

13

Polyphase Sort Example

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

14

Alternative File Organizations

Many alternatives exist, each ideal for some


situation , and not so good in others:
Heap files: Suitable when typical access is a file
scan retrieving all records.
Sorted Files: Best if records must be retrieved in
some order, or only a `range of records is needed.
Hashed Files: Good for equality selections.
File is a collection of buckets. Bucket = primary
page plus zero or more overflow pages.
Hashing function h: h(r) = bucket in which
record r belongs. h looks at only some of the
fields of r, called the search fields.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

15

Indexes

An index on a file speeds up selections on


the search key field(s)
Search key = any subset of the fields of a
record

Search key is not the same as key (minimal set of


fields that uniquely identify a record).

Entries in an index: (k, r), where:

k = the key
r = the record OR record id OR record ids
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

16

Index Classification

Clustered/unclustered

Clustered = records sorted in the key order


Unclustered = no

Dense/sparse

Dense = each record has an entry in the index


Sparse = only some records have

Primary/secondary

Primary = on the primary key


Secondary = on any key
Some books interpret these differently

B+ tree / Hash table /


DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

17

Clustered vs. Unclustered Index

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

18

Multiway Search Trees

Multiway Search Trees (MWSTs) are a generalization of


BSTs
MWST of order n:
Each node has n or fewer sub-trees: S1 S2.Sm, m n
Each node has n 1or fewer keys
k1 k2 km1 : m1 keys in ascending order k(Si) ki k(Si+1) ,
k(Sm1) < k(Sm)
Suitable for disks:

Nodes correspond to disk pages


Pros:
tree height is low for large n

fewer disk accesses

Cons:

low space utilization if non-full


MWSTs are non-balanced in general!
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

19

MWST Example

Example: 4000 keys, n=5

At least 4000/(51) nodes (pages)


1st level(root): 1 node, 4 keys, 5 sub-trees
+2ndlevel: 5 nodes, 20 keys, 25 sub-trees
+3rdlevel: 25 nodes, 100 keys, 125 sub-trees
+4thlevel: 125 nodes, 500 keys, 525 sub-trees
+5th level: 525 nodes, 2100 keys, 2625 sub-tress
+6th level: 2625 nodes, 10500 keys,
tree height = 6 (including root)
If n = 11 at least 400 nodes
tree height = 3
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

20

Operations and Issues on MWSTs

Operations

Search: returns pointer to node containing the key and


position of key in the node
Insert: new key if not the tree
Delete: existing key

Important Issues

Keep MWST balanced after insertions or deletions


Balanced MWSTs: B-trees, B+-trees
Reduce number of disk accesses
Data storage: two alternatives
1. inside nodes: less sub-trees, nodes
2. pointers from the nodes to data pages
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

21

B Trees

So far search trees were limited to main memory


structures

Counter-example: transaction data of a bank > 1


GB per day

Assumption: the dataset organized in a search tree fits in


main memory (including the tree overhead)

use secondary storage media (punch cards, hard disks,


magnetic tapes, etc.)

Consequence: make a search tree structure


secondary-storage-enabled
B Trees - Proposed by R. Bayer and E. M.
McCreigh in 1972.
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

22

B-tree Definitions

Node x has fields

n[x]: the number of keys of that the node


key1[x] keyn[x][x]: the keys in ascending order
leaf[x]: true if leaf node, false if internal node

if internal node, then c1[x],

, cn[x]+1[x]: pointers to children


Keys separate the ranges of keys in the subtrees. If ki is an
arbitrary key in the subtree ci[x] then ki keyi[x] ki+1
Every leaf has the same depth
In a B-tree of a degree t all nodes except the root node
have between t and 2t children (i.e., between t1 and 2t1
keys).
The root node has between 0 and 2t children (i.e., between
0 and 2t1 keys)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

23

B Tree Examples

n is the number of keys


stored in a node

n=3

n=5

n=7

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

24

Binary-trees vs. B-trees

Size of B-tree nodes: determined by the page size.


One page = one node.
A B-tree of height 2 may contain > 1 billion keys!
Heights of Binary-tree and B-tree are logarithmic

Binary-tree: logarithm of base 2


B-tree: logarithm of base, e.g., 1000

root:1 node
1000 keys

1000
1001

1000
1001

1000

1000

1000
1001

1001

1000

1000

level 1:1001 nodes,


1,001,000 keys
level 2: 1,002,001 nodes,
1,002,001,000 keys

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

25

Height of a B-tree
B-tree T of height h, containing n 1 keys and
minimum degree t 2, the following restriction on
the height holds:

n 1
h logt
2

depth

1
t-1

t-1

t-1

#of
nodes

2t

t-1

t-1

t-1

t-1

t-1

n 1 (t 1) 2t i 1 2t h 1
i 1

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

26

B-tree Operations

An implementation needs to support the


following B-tree operations

Searching (simple)
Creating an empty tree (trivial)
Insertion (complex)
Deletion (complex)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

27

Creating an Empty Tree. Searching

Creating:

Empty B-tree = create a


root & write it to disk!

BTreeCreate(T)
01
02
03
04
05

x AllocateNode();
leaf[x] TRUE;
n[x] 0;
DiskWrite(x);
root[T] x

Searching

Straightforward
generalization of a binary
tree search

BTreeSearch(x,k)
01
02
03
04
05
06
08
09
10

i 1
while i n[x] and k > keyi[x]
i i+1
if i n[x] and k = keyi[x] then
return(x,i)
if leaf[x] then
return NIL
else DiskRead(ci[x])
return BTtreeSearch(ci[x],k)

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

28

Splitting Nodes (1)

Nodes fill up and reach their maximum capacity 2t 1


Before we can insert a new key, we have to make room,
i.e., split nodes
Result: one key of x moves up to parent + 2 nodes with
x: parent node
t 1 keys
y: node to be split and child of x
i: index in x
z: new node

... N W ...

... N S W ...

y = ci[x]

y = ci[x]

P Q R S T V
W

T1

...

P Q R

z = ci+1[x]
T V W

T8

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

29

Splitting Nodes (2)


BTreeSplitChild(x,i,y)
x: parent node
01 z AllocateNode()
y: node to be split and child of x
02 leaf[z] leaf[y]
i: index in x
03 n[z] t-1
z: new node
04 for j 1 to t-1
05
keyj[z] keyj+t[y]
x
06 if not leaf[y] then
... N W ...
07
for j 1 to t
08
cj[z] cj+t[y]
09 n[y] t-1
y = ci[x]
10 for j n[x]+1 downto i+1
11
cj+1[x] cj[x]
P Q R S T V
12 ci+1[x] z
W
13 for j n[x] downto i
14
keyj+1[x] keyj[x]
15 keyi[x] keyt[y]
T1
T8
...
16 n[x] n[x]+1
17 DiskWrite(y)
Running Time:
18 DiskWrite(z)
A local operation that does not traverse the tree
19 DiskWrite(x)

(t) CPU-time, since two loops run t times


3 I/Os

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

30

Inserting Keys

Done recursively, by starting from the root


and recursively traversing down the tree to
the leaf level
Before descending to a lower level in the
tree, make sure that the node contains less
than 2t 1 keys:

so that if we split a node in a lower level we will


have space to include a new key

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

31

Inserting Keys (2)

Special case: root is full (BtreeInsert)


BTreeInsert(T)
01 r root[T]
02 if n[r] = 2t 1 then
03
s AllocateNode()
05
root[T] s
06
leaf[s] FALSE
07
n[s] 0
08
c1[s] r
09
BTreeSplitChild(s,1,r)
10
BTreeInsertNonFull(s,k)
11 else BTreeInsertNonFull(r,k)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

32

Splitting the Root

Splitting the root requires the creation of a


new root
root[T]

root[T]

A D F H L N P

T1

...

T8

A D F

L N P

The tree grows at the top instead of the


bottom
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

33

Inserting Keys

BtreeNonfull tries to insert a key k into a


node x, which is assumed to be non-full
when the procedure is called

BTreeInsert and the recursion in


BTreeInsertNonfull guarantees that this
assumption is true!

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

34

Inserting Keys
BTreeInsertNonFull(x,k)
01 i n[x]
02 if leaf[x] then
03
while i 1 and k < keyi[x]
04
keyi+1[x] keyi[x]
05
i i - 1
06
keyi+1[x] k
07
n[x] n[x] + 1
08
DiskWrite(x)

09 else while i 1 and k < keyi[x]


10
i i - 1
11
i i + 1
12
DiskRead ci[x]
13
if n[ci[x]] = 2t 1 then
14
BTreeSplitChild(x,i,ci[x])
15
if k > keyi[x] then
16
i i + 1
17
BTreeInsertNonFull(ci[x],k)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

leaf insertion

internal node:
traversing tree

35

Insertion: Example
initial tree (t = 3)
A C D
E

G M P X
J K

B
inserted

N O

R S T U
V

Y Z

G M P X

A B C D
E

J K

Q
inserted

N O

R S T U
V

Y Z

G M P T X

A B C D
E

J K

N O

Q R
S

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

U V

Y Z

36

Insertion: Example (2)


P

L inserted
G M

A B C D
E

J K L

F
inserted

T X

N O

U V

Y Z

U V

Y Z

C G M
A B

Q R
S

D E F

J K L

T X
N O

Q R
S

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

37

Insertion: Running Time

Disk I/O: O(h), since only O(1) disk accesses


are performed during recursive calls of
BTreeInsertNonFull
CPU: O(th) = O(t logt n)
At any given time there are O(1) number of
disk pages in main memory

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

38

Deleting Keys

Done recursively, by starting from the root and


recursively traversing down the tree to the leaf level
Before descending to a lower level in the tree,
make sure that the node contains at least t keys (cf.
insertion less than 2t 1 keys)
BtreeDelete distinguishes three different
stages/scenarios for deletion
Case 1: key k found in leaf node
Case 2: key k found in internal node
Case 3: key k suspected in lower level node

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

39

Deleting Keys (2)


initial tree

C G M
A B

D E F

F
deleted:
case 1
A B

J K L

T X
N O

Q R S

U V

Y Z

U V

Y Z

P
C G M

D E

J K L

T X

N O

Q R S

Case 1: If the key k is in node x, and x is a leaf,


delete k from x
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

40

Deleting Keys (3)


Case 2: If the key k is in node x, and x is not a leaf,
delete k from x

a) If the child y that precedes k in node x has at least t keys,


then find the predecessor k of k in the sub-tree rooted at y.
Recursively delete k, and replace k with k in x.
b) Symmetrically for successor node z

M deleted:
case 2a

P
C G L

A B

D E

J K

x
N O

T X
Q R S

U V

Y Z

y
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

41

Deleting Keys (4)

If both y and z have only t 1 keys, merge k with


the contents of z into y, so that x loses both k and
the pointers to z, and y now contains 2t 1 keys.
Free z and recursively delete k from y.
G
deleted:
case 2c
A B

P
C L

x-k

D E J K

N O

y = y+k + z - k

T X
Q R
S

U V

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

Y Z

42

Deleting Keys - Distribution


Descending down the tree: if k not found in
current node x, find the sub-tree ci[x] that has to
contain k.
If ci[x] has only t 1 keys take action to ensure
that we descent to a node of size at least t.
Case 1 (two cases exist): if ci[x] has only t 1 keys,
but a sibling with at least t keys, give ci[x] an extra
key by:

moving a key from x to ci[x],


moving a key from ci[x]s immediate left and right

sibling up into x, and


moving the appropriate child from the sibling into
ci[x] - distribution
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

43

Deleting Keys Distribution(2)


x
ci[x]

... k ...

ci[x]

...
A

A B

...
A B

C L P T X

delete B

ci[x]

... k ...

E J K

N O

sibling
B deleted:

Q R
S

U V

Y Z

E L P T X
A C

J K

N O

Q R
S

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

U V

Y Z
44

Deleting Keys - Merging

If ci[x] and both of ci[x]s siblings have t 1


keys, merge ci with one sibling:

moving a key from x down into the new merged


node to become the median key for that node

ci[x]

... l k m...

...
A

... l m ...

...l k m
...

A B

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

45

Deleting Keys Merging (2)


P

ci[x]

delete D
A B

C L

D E J K

D
deleted:

sibling
N O

T X

Q R
S

U V

Y Z

U V

Y Z

C L P T X
A B

tree shrinks in
height

E J K

N O

Q R
S

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

46

Deletion: Running Time

Most of the keys are in the leaf, thus deletion most


often occurs there!
In this case deletion happens in one downward
pass to the leaf level of the tree
Case 2: Deletion from an internal node might
require backing up
Running time:

Disk I/O: O(h), since only O(1) disk operations are


produced during recursive calls
CPU: O(th) = O(t logt n)
DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

47

Two-pass Operations

Simpler, practical versions of algorithms use


two passes (down and up the tree):

Down Find the node where deletion or insertion

should occur
Up If needed, split, merge, or distribute;
propagate splits, merges, or distributes up the tree

To avoid reading the same nodes twice, use


a buffer of nodes

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

48

B-Tree / B+Tree animations

B-Tree

http://slady.net/java/bt/view.php

http://www.youtube.com/watch?v=coRJrcIYbF4

http://ats.oka.nu/b-tree.en.html

http://www.cs.auckland.ac.nz/software/AlgAnim/n_ary_trees.html
B+tree

http://www.seanster.com/BplusTree/BplusTree.ht
ml

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

49

Reading

AHU, chapter 11
CLR, chapter 19, CLRS chapter 18
Notes

DSA - lecture 12 - T.U.Cluj-Napoca - M. Joldos

50

You might also like