Hashing - Datastructures and Algorithms
Hashing - Datastructures and Algorithms
2
In a worst-case sense?
No, but we can do this:
Let U denote the universal set from which the keys are drawn.
Then, we just maintain an array of size |U| where each element contains a
binary value:
But space is O(|U|) which may be much larger than n, the number of keys. 3
In an expected sense?
4
Points to note
The hash function, h, must be computationally simple
Since the address space is much smaller than the size of the universal
set U, two or more keys can hash to the same spot in the hash table.
5
Hash Function
The division method:
h: U { 0, 1, 2, ……, n-1}
Choice of n:
6
Universal Set U
We will assume that the keys in the universal set U are the set of natural numbers
N={0, 1, 2, ……}.
If they are not, then we can suitably interpret them to be natural numbers.
Mapping:
For example, a string over the set of ASCII characters, can be interpreted as an
integer in base 128.
First, we multiply the key k by a constant A, 0 < A < 1 and extract the
fractional part of kA
8
Knuth’s A
Here, the choice of n is not critical.
We take it to be a power of 2
D. Knuth suggests the following value for A
A = 0.618033
Insertion:
To perform insertion:
Successively examine, or probe the hash table…
until we find an empty slot in which to insert the key.
10
Open-addressing Hashing
Searching:
The algorithm for searching for key k probes the same sequence of slots that
the insertion algorithm examined when key k was inserted.
Why?
If k was inserted, it would occupy a position…
…assuming that keys are not deleted from the hash table.
11
Open-addressing Hashing
Deletion:
A solution:
Mark the slot by a special value (not deleting it).
12
Open-addressing Hashing
Assumption:
13
Linear probing
In this case, the probe sequence is:
(h(k)+0) mod n
(h(k)+1) mod n
:
:
(h(k)+n-1) mod n
Groups with large number of slots build up…increasing the average search time.
14
Example: Primary Clustering
15
Primary Clustering
16
Primary Clustering
17
Analysis of Linear Probing
Let m , where m of n slots in the hash table are occupied
n
18
Quadratic probing
In this case, the probe sequence is
(h(k ) c1i c2i 2 ) mod n
for i=0, 1, …, n-1,
where c1 and c2 are auxiliary constants.
19
Example: Quadratic Probing
20
Quadratic Probing
21
Quadratic Probing
22
Quadratic Probing
23
Analysis of Quadratic Probing
Crucial questions:
Ease of computation?
24
Viability
Theorem:
If quadratic probing is used and the table size is prime,
then a new element can be inserted
if the table is at least half empty.
Proof: By contradiction.
25
Growth
What do we do when the load factor gets too high?
Rehash!
Rehash:
Scan the entries in the current table, and… insert them in a new hash table
26
Efficiency
Theorem: Quadratic probing can be implemented without expensive
multiplications and divisions.
Proof (Sketch):
Let Hi-1 be the most recent probe.
The ith probe is given by
29
Analysis of Double Hashing
Expected number of probes assuming each key is equally likely to be
searched for (Knuth):
30
Comparison: Linear versus Uniform
31
Final Note on Clustering
Linear Probing – Primary Clustering
– all k share one probe sequence,
– starting point depends on h(k)
Quadratic Probing – Secondary Clustering
– each h(k) starts a different probe sequence
Double Hashing
– each k starts a different probe sequence
32