0% found this document useful (0 votes)
45 views

Hashing - Datastructures and Algorithms

Hashing allows performing operations like find, insert, and delete in O(1) expected time by mapping keys to array indices via a hash function. Collisions occur when different keys hash to the same index, requiring collision resolution strategies. Linear probing resolves collisions by probing successive indices, but causes clustering. Quadratic probing improves on this by using a quadratic function to determine probe sequences. Double hashing further reduces clustering by using a secondary hash function to space out probes. These open addressing schemes allow hash tables to provide fast access while using less space than balanced search trees.

Uploaded by

Bahaa Zantout
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Hashing - Datastructures and Algorithms

Hashing allows performing operations like find, insert, and delete in O(1) expected time by mapping keys to array indices via a hash function. Collisions occur when different keys hash to the same index, requiring collision resolution strategies. Linear probing resolves collisions by probing successive indices, but causes clustering. Quadratic probing improves on this by using a quadratic function to determine probe sequences. Double hashing further reduces clustering by using a secondary hash function to space out probes. These open addressing schemes allow hash tables to provide fast access while using less space than balanced search trees.

Uploaded by

Bahaa Zantout
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Hashing

Data Structures and Algorithms (60-254)


Hashing
We have seen that we can maintain a set of n keys
in a balanced binary search tree (AVL or red-black tree)

In these data structures, FIND, INSERT or DELETE operations


take O(log n) time in the worst-case

The question is the following:

Can we do these operations in O(1) time?

2
In a worst-case sense?
No, but we can do this:

Let U denote the universal set from which the keys are drawn.
Then, we just maintain an array of size |U| where each element contains a
binary value:

1 denotes the presence of the element, and 0 denotes its absence.

But space is O(|U|) which may be much larger than n, the number of keys. 3
In an expected sense?

4
Points to note
The hash function, h, must be computationally simple

It must distribute keys evenly in the address space

Since the address space is much smaller than the size of the universal
set U, two or more keys can hash to the same spot in the hash table.

This is called collision and…


we must have a collision-handling strategy.

5
Hash Function
The division method:

h: U  { 0, 1, 2, ……, n-1}

is defined as h(k) = k mod n

Choice of n:

Good values of n are primes not too close to exact powers of 2.

6
Universal Set U
We will assume that the keys in the universal set U are the set of natural numbers
N={0, 1, 2, ……}.

If they are not, then we can suitably interpret them to be natural numbers.

Mapping:
For example, a string over the set of ASCII characters, can be interpreted as an
integer in base 128.

Thus the string “pt” can be interpreted as:

the integer 128 x 112 + 116 = 14,452


7
Hash Function
The multiplication method:

First, we multiply the key k by a constant A, 0 < A < 1 and extract the
fractional part of kA

We multiply this value by n and take the floor of the result

That is: h(k) =

8
Knuth’s A
Here, the choice of n is not critical.
We take it to be a power of 2
D. Knuth suggests the following value for A
A = 0.618033

If k =123456, n=10000, then with A as above, we get:

h(k) = 10000(123456 0.618033 123456 0.618033)


= 10000 0.882048 = 8820.48 = 8820
9
Open-addressing Hashing
In open addressing all elements are stored in the hash table itself.

Here, we discuss only open-addressing hashing.

Insertion:

To perform insertion:
Successively examine, or probe the hash table…
until we find an empty slot in which to insert the key.

The sequence of positions probed depends on the key being inserted.

10
Open-addressing Hashing
Searching:

The algorithm for searching for key k probes the same sequence of slots that
the insertion algorithm examined when key k was inserted.

The search can terminate (unsuccessfully) when it finds an empty slot…

Why?
If k was inserted, it would occupy a position…

…assuming that keys are not deleted from the hash table.
11
Open-addressing Hashing
Deletion:

When deleting a key from slot i,


we should not physically remove that key.

Doing so may make it impossible to retrieve a key k during whose insertion


we probed slot i and found it occupied.

A solution:
Mark the slot by a special value (not deleting it).
12
Open-addressing Hashing
Assumption:

“Uniform hashing” for the analysis below.

This means that each key is equally likely…


to generate any of the n! permutations of
{0, 1, 2, ……, n-1} as its probe sequence

13
Linear probing
In this case, the probe sequence is:

(h(k)+0) mod n
(h(k)+1) mod n
:
:
(h(k)+n-1) mod n

Linear probing is easy to implement…but it suffers from a problem:


primary clustering.

Groups with large number of slots build up…increasing the average search time.
14
Example: Primary Clustering

15
Primary Clustering

16
Primary Clustering

17
Analysis of Linear Probing
Let   m , where m of n slots in the hash table are occupied
n

 is called the load factor and is clearly < 1

Expected number of probes assuming each key is equally likely to be


searched for (Knuth):

Successful Find (1 + 1/(1- ))/2


Insertion/Unsuccessful Find (1 + (1/(1- ))2)/2

18
Quadratic probing
In this case, the probe sequence is
(h(k )  c1i  c2i 2 ) mod n
for i=0, 1, …, n-1,
where c1 and c2 are auxiliary constants.

Works much better than linear probing.

19
Example: Quadratic Probing

20
Quadratic Probing

21
Quadratic Probing

22
Quadratic Probing

23
Analysis of Quadratic Probing
Crucial questions:

Will we be always able to insert element x if table is not full?

Ease of computation?

What happens when the load factor gets too high?


(this applies to linear probing as well)

24
Viability
Theorem:
If quadratic probing is used and the table size is prime,
then a new element can be inserted
if the table is at least half empty.

Also, no cell is probed twice in the course of insertion.

Proof: By contradiction.

25
Growth
What do we do when the load factor gets too high?

Rehash!

Double the size of the hash table

Rehash:

Scan the entries in the current table, and… insert them in a new hash table

26
Efficiency
Theorem: Quadratic probing can be implemented without expensive
multiplications and divisions.
Proof (Sketch):
Let Hi-1 be the most recent probe.
The ith probe is given by

Thus, we compute the ith probe without squaring i.


Also, mod can be computed by just subtracting n.
 Quadratic Probing is efficient !!
27
Double Hashing
Double hashing uses a secondary hash function d(k) and handles collisions
by placing an item in the first available cell of the series
(i  jd(k)) mod N
for j  0, 1, … , N  1
The secondary hash function d(k) cannot have zero values
The table size N must be a prime to allow probing of all the cells
Common choice of compression function for the secondary hash function:
d(k)  q  k mod q
where
q<N
q is a prime
The possible values for d(k) are
1, 2, … , q
28
Example of Double Hashing

29
Analysis of Double Hashing
Expected number of probes assuming each key is equally likely to be
searched for (Knuth):

Successful Find (1/ )(ln ( 1 / (1 -  ))


Insertion/Unsuccessful Find 1 / (1 - )

30
Comparison: Linear versus Uniform

31
Final Note on Clustering
Linear Probing – Primary Clustering
– all k share one probe sequence,
– starting point depends on h(k)
Quadratic Probing – Secondary Clustering
– each h(k) starts a different probe sequence
Double Hashing
– each k starts a different probe sequence
32

You might also like