Algorithms Advanced Data Structures For A - Vickler Andy
Algorithms Advanced Data Structures For A - Vickler Andy
Four of those steps are identical to those used to insert a new node at the
front of a singly linked list. The additional step changes the head's previous
pointer.
2. Adding a new node after a given one.
We have a pointer to a node called prev_node, and the new node is to be
inserted after this.
Below are the steps needed to do this:
/* Given a node as prev_node, insert
a new node after the given node */
void insertAfter(Node* prev_node, int new_data)
{
/*1. check if the given prev_node is NULL */
if (prev_node == NULL)
{
cout<<"the given previous node cannot be NULL";
return;
}
/* 2. allocate new node */
Node* new_node = new Node();
return;
}
Six of these steps are identical to those to insert a new node after a specified
node in a singly linked list, while the additional step changes the new node's
previous pointer.
4. Adding a node at the end
Let's assume that the pointer to the given node is called next_node and the
new node's data is added as new_data. The steps required are:
1. Determine whether next_node is NULL. If yes, the function must be
returned from because new nodes cannot be added before NULLs.
2. Allocate the new node some memory
3. Set new_node->data = new_data
4. The new node’s previous pointer should also be set as the next_node’s
previous node – new_node->prev = next_node->prev
5. The next_node’s previous pointer should be set as new_node –
new_node->prev = new_node
6. The new_node’s pointer should be set at next_node – new_node-
>next = next_node
7. If new_node’s previous node is not NULL, the previous node’s
pointer should be set as new_node – new_node->prev->next =
new_node. If it is NULL, it becomes the head node so (*head_ref) =
new_node.
This is how this approach is implemented:
Code block
Output:
Created DLL is:
Traversal in the forward Direction
91576
Traversal in the reverse direction
67519
The program below tests the functions:
// A complete working C++ program to
// demonstrate all insertion methods
#include <bits/stdc++.h>
using namespace std;
return;
}
return 0;
}
The output should be:
Created DLL is:
Traversal in the forward Direction
175864
Traversal in the reverse direction
468571
XOR Linked Lists
Standard doubly linked lists need space where two address fields store the
previous and next node addresses. The previous node address is retained,
being carried over so the previous pointer can compute it.
The XOR linked list is a memory-efficient doubly linked list, and these are
created using a single space for each node's address field using a bitwise
XOR operation. In these lists, the nodes each store the previous and next
nodes' XOR of addresses.
There are two ways that XOR representations behave differently from
ordinary representations.
1. Ordinary Representation
Node A:
prev = NULL, next = add(B) // previous is NULL while next is address
of B
Node B:
prev = add(A), next = add(C) // previous is address of A while next is
address of C
Node C:
prev = add(B), next = add(D) // previous is address of B while next is
address of D
Node D:
prev = add(C), next = NULL // previous is address of C while next is
NULL
2. XOR List Representation
We'll call the XOR address variable npx. We can traverse an XOR linked
list forward and backward, but, as we do, we must remember each
previously accessed node's address so we can calculate the address for the
next node.
For example, at node C, we need node B's address. An XOR of add(B) and
an npx of C provides add(D).
Here’s an illustration of that:
Node A:
npx = 0 XOR add(B) // bitwise XOR of zero and address of B
Node B:
npx = add(A) XOR add(C) // bitwise XOR of address of A and
address of C
Node C:
npx = add(B) XOR add(D) // bitwise XOR of address of B and
address of D
Node D:
npx = add(C) XOR 0 // bitwise XOR of address of C and 0
npx(C) XOR add(B)
=> (add(B) XOR add(D)) XOR add(B) // npx(C) = add(B) XOR
add(D)
=> add(B) XOR add(D) XOR add(B) // a^b = b^a and (a^b)^c =
a^(b^c)
=> add(D) XOR 0 // a^a = 0
=> add(D) // a^0 = a
In the same way, the list can be traversed backward.
Here's an example of how to implement this:
// C++ Implementation of Memory
// efficient Doubly Linked List
// Importing libraries
#include <bits/stdc++.h>
#include <cinttypes>
// Class 1
// Helepr class(Node structure)
class Node {
public : int data;
// Xor of next node and previous node
Node* xnode;
};
// Method 1
// The Xored value of the node addresses is returned
Node* Xor(Node* x, Node* y)
{
return reinterpret_cast<Node*>(
reinterpret_cast<uintptr_t>(x)
^ reinterpret_cast<uintptr_t>(y));
}
// Method 2
// Insert a node at the beginning of the Xored LinkedList and
// mark it as head
void insert(Node** head_ref, int data)
{
// Allocate memory for new node
Node* new_node = new Node();
new_node -> data = data;
// Change head
*head_ref = new_node;
}
// Method 3
// This prints the contents of the doubly linked
// list in a forward direction
void printList(Node* head)
{
Node* curr = head;
Node* prev = NULL;
Node* next;
// Method 4
// main driver method
int main()
{
Node* head = NULL;
insert(&head, 10);
insert(&head, 100);
insert(&head, 1000);
insert(&head, 10000);
return (0);
}
Output:
10000 1000 100 10
So, we now know how to create a doubly linked list with each node having
its address field in a single space. Now we can discuss how to implement
these lists. We'll be talking about two functions:
3. One that inserts a new node at the start
4. One that traverses the list forwards.
In the next code example, we insert a new node at the start using the
insert() function. The head pointer needs to be changed, which is why we
use a double-pointer. Here are some things to remember:
The XOR of the next and previous nodes are stored with each node
and called npx.
npx is the only address member with each node.
When a new node is inserted at the beginning, the new node's npx is
always the XOR of NULL and the current head.
The current head's npx must be changed to the XOR or the new node
and the node beside the current head.
We use printList for forward traversal, printing the data values in each
node. The pointer to the next node must be obtained at each point.
The next node address can be obtained by tracking the current and
previous nodes. An XOR of curr->npx and prev gives us the next
node's address.
Here’s a code example:
/* C++ Implementation of Memory
efficient Doubly Linked List */
#include <bits/stdc++.h>
#include <cinttypes>
using namespace std;
// Change head
*head_ref = new_node;
}
// Driver code
int main ()
{
/* Create the following Doubly Linked List
head-->40<-->30<-->20<-->10 */
Node *head = NULL;
insert(&head, 10);
insert(&head, 20);
insert(&head, 30);
insert(&head, 40);
return (0);
}
Output:
The Following are the nodes of Linked List:
40 30 20 10
Note – C and C++ standards do not define the XOR of pointers so this may
not work on every platform.
Self-Organizing Lists
Self-organizing lists are that reorder their elements based on a heuristic that
helps improve access times. The aim is to improve linear search efficiency
by moving those items we access more frequently towards the head or front
of the list. In terms of element access, the best-case scenario for a self-
organizing list is near-constant time.
There are two search possibilities – online, where we have no idea of the
search sequence, and offline, where we have knowledge of the entire search
sequence upfront. In the latter, the nodes may be placed per the decreasing
search frequencies, where the element searched for the most frequently is
placed first, and the least searched element goes last. In terms of real-world
applications, knowing the search sequence upfront is not always easy to do.
Self-organizing lists reorder themselves based on each search, so the aim is
to use a locality of reference. In most databases, 20% of items are located,
with 80% of the searches.
Below, we take a quick look at three strategies a self-organizing list uses.
1. Move-To-Front Method
This method moves the most recent item searched for to the front of the list,
making it easy to implement. However, because it focuses on the recent
search, even infrequently searched-for items will move to the head of the
list, which is a disadvantage in terms of access time.
Here are some examples:
Input : list : 1, 2, 3, 4, 5, 6
searched: 4
Output : list : 4, 1, 2, 3, 5, 6
Input : list : 4, 1, 2, 3, 5, 6
searched : 2
Output : list : 2, 4, 1, 3, 5, 6
// CPP Program to implement self-organizing list
// using move to front method
#include <iostream>
using namespace std;
// if key found
if (current->value == key) {
// Driver Code
int main()
{
/* inserting five values */
insert_self_list(1);
insert_self_list(2);
insert_self_list(3);
insert_self_list(4);
insert_self_list(5);
return 0;
}
Output:
Searched: 4
List: 4 --> 1 --> 2 --> 3 --> 5
Searched: 2
List: 2 --> 4 --> 1 --> 3 --> 5
2. The Count Method
The count method is where a count is kept for the number of times a node is
searched for, which means it records the search frequency. Each node has
additional storage space associated with it, and this increments each time
the node is searched. The nodes are then arranged in an order whereby the
most searched goes to the front of the list.
Here are some examples:
Input : list : 1, 2, 3, 4, 5
searched : 4
Output : list : 4, 1, 2, 3, 5
Input : list : 4, 1, 2, 3, 5
searched : 5
searched : 5
searched : 2
Output :
list : 5, 2, 4, 1, 3
Let’s explain this:
5 is the most searched-for – twice – which puts it at the head of the
list.
2 is searched for once, as is 1
Because 2 is searched for more recently than 1, it is kept ahead of 4
Because the remainder are not searched for, they retain the order they
were inserted into the list.
// CPP Program to implement self-organizing list
// using thecount method
#include <iostream>
using namespace std;
// if key is found
if (current->value == key) {
// if it is to be placed at beginning
if (temp == head)
head = current;
else
temp_prev->next = current;
}
}
return true;
}
prev = current;
current = current->next;
}
return false;
}
// Driver Code
int main()
{
/* inserting five values */
insert_self_list(1);
insert_self_list(2);
insert_self_list(3);
insert_self_list(4);
insert_self_list(5);
search_self_list(4);
search_self_list(2);
display();
search_self_list(4);
search_self_list(4);
search_self_list(5);
display();
search_self_list(5);
search_self_list(2);
search_self_list(2);
search_self_list(2);
display();
return 0;
}
Outpu t :
List: 1(0) --> 2(0) --> 3(0) --> 4(0) --> 5(0)
List: 2(1) --> 4(1) --> 1(0) --> 3(0) --> 5(0)
List: 4(3) --> 5(1) --> 2(1) --> 1(0) --> 3(0)
List: 2(4) --> 4(3) --> 5(2) --> 1(0) --> 3(0)
3. The Transpose Method
The transpose method swaps the accessed node with its predecessor. This
means when a node is accessed, it gets swapped with the one in front. The
only exception to this is when the accessed node is the head. Put simply, an
accessed node’s priority is slowly increased until it reaches the head
position. The difference between this and the other methods is that it
requires many accesses to get a node to the head.
Here are some examples:
Input :
list: 1, 2, 3, 4, 5, 6
searched: 4
Output :
list: 1, 2, 4, 3, 5, 6
Input :
list: 1, 2, 4, 3, 5, 6
searched: 5
Output :
list: 1, 2, 4, 5, 3, 6
Let’s explain this:
In the first case, node 4 is swapped with node 3, its predecessor
In the second case, node 5 is swapped with its predecessor, also 3
// CPP Program to implement self-organizing list
// using the move to front method
#include <iostream>
using namespace std;
// if key found
if (current->value == key) {
// Driver Code
int main()
{
/* inserting five values */
insert_self_list(1);
insert_self_list(2);
insert_self_list(3);
insert_self_list(4);
insert_self_list(5);
insert_self_list(6);
return 0;
}
Output:
Searched: 4
List: 1 --> 2 --> 4 --> 3 --> 5 --> 6
Searched: 5
List: 1 --> 2 --> 4 --> 5 --> 3 --> 6
Unrolled Linked List
Unrolled linked list data structures are a linked list variant. Rather than a
single element being stored at a node, these will store entire arrays.
With this type of list, you get the benefits of arrays in terms of the minor
memory overhead, with the benefits of linked lists, in terms of fast deletion
and insertion. Those benefits combine to give much better performance.
The unrolled linked list spreads the overheads (pointers) by storing several
elements at each node. So, if an array containing 4 elements is stored at
each node, the overhead is spread across those elements.
Unrolled linked lists also perform much faster when you consider the
modern CPU's capacity for cache management. So, while their overhead is
quite high per node compared to a standard linked list, this is a minor
drawback compared to the benefits it offers in modern computing.
Properties
Unrolled link lists are basically linked lists where an array of values is
stored in a node rather than a single value. The array can have anything a
standard array has, like abstract data types such as primitive types. Each
node can only hold a certain number of values, and the average
implementation will ensure each node has an average capacity of 3/43/3.
This is achieved by the node moving values from one array to another when
one gets full.
An unrolled link list has a slightly higher overhead per node because each
must also store the maximum number of values per array. However, on a
per value basis, it actually works out lower than standard linked lists. As
each array's maximum size increases, the average space required for each
value reduces and, when the value is a bit or another very small type, you
get even more space advantages.
In short, an unrolled linked list is a combination of a standard linked list and
an array. You get the ultra-fast indexing and storage locality advantages of
the array, both of which arise from the static array's underlying properties.
On top of that, you retain the node insertion and deletion advantages from
the linked list.
Insertion and Deletion
The algorithm used to insert or delete elements in unrolled linked lists will
depend on the implementation. The low-water mark is typically around
50%, which means when an insertion is carried out in a node that doesn't
have space for the element, a new node gets created, and 50% of the
elements from the original array are inserted. Conversely, if an element is
removed, resulting in the number of values reducing to under 50% in the
node, the elements from the array next door are moved in to push it back up
to 50%. Should that result in the neighboring array dropping below 50%,
both nodes are merged.
Generally, this means the average node utilization is 70-75%, and, with
reasonable node sizes, the overhead is small compared to a standard linked
list. In standard linked lists, 2 to 3 pieces of overhead are required per node,
but the unrolled linked lists amortize this across the elements, resulting in
1/size1/size, nearer to an array's overhead.
The low-water mark can be altered to alter your list's performance. Increase
it, and each node's average utilization increases, but there is a cost to this –
splitting and merging would need to be done more frequently.
Algorithm Pseudocode
The insertion and deletion algorithm is as follows. The Node's all contain an
array called data, several elements in the array numElements, and next,
which is a pointer to the next node.
Insert(newElement)
Find node in linked list e
If e.numElements < e.data.size
e.data.push(newElement)
e.numElements ++
else
create new Node e1
move the final half of e.data into e1.data
e.numElements = e.numElements / 2
e1.data.push(newElement)
e1.numElements = e1.data.size / 2 + 1
Delete(element)
Find element in node e
e.data.remove(element)
e.numElements --
while e.numElements < e.data.size / 2
put element from e.next.data in e.data
e.next.numElements --
e.numElements ++
if e.next.numElements < e.next.data.size / 2
merge nodes e and e.next
delete node e.next
A programmer can take quite a few liberties with these functions. For
example, in the Insert function, the programmer can decide which node they
want to insert into. It could be the first or last, or it could be determined by
a specific grouping or sorting. Typically, this will depend entirely on the
data the programmer is working with.
Time and Space Complexity
It isn't easy to analyze an unrolled linked list because there are many ways
to implement them and plenty of data-dependent variations. However, their
amortization across the array elements ensures their time and space analysis
is good.
Time
When you want to insert an element, the first step is to locate the node you
want the element to be inserted in, which is O(n)O(n). If the node isn't full,
that's all there is to it. However, a new node needs to be created if the node
is full and the values moved over. This process isn't dependent on the linked
list's total values, so it is in constant time.
Deleting an element works in the same way but in reverse. Constant time
operations can be taken advantage of because linked lists can quickly
update their pointers between each other, independent of the list size.
Another quality of unrolled linked lists considered important is indexing.
However, it depends on caching, which we'll cover in a moment.
Operation Complexity
Complexity
if (Current_Node_Delete) {
current.getChildren().remove(A);
return current.getChildren().isEmpty();
}
return false;
}
Applications of Trie
Tries have a few applications:
A Spell Checker
There are three steps to the spellchecking process:
Step one – look for the required word in the dictionary
Step two – generate a few suggestions
Step three – sort those suggestions, ensuring the word you want is at
the top
Trie stores the word in a dictionary, and the spell checker can be efficiently
applied to find words in a data structure. Not only will you see the word
easily in the dictionary, but you will also find it much simpler to build
algorithms that include collections of relevant suggestions or words.
Auto-Complete
This function tends to be used widely on mobile apps, the internet, and text
editors. It gives us an easy way to find alternative words to complete a word
for these reasons:
It gives an entries filter in alphabetical order by the node's key
Pointers can be traced to get the node representing the user-entered
string
When you begin typing, auto-complete will attempt to complete what
you write.
Browser History
Trie is also used to complete URLs in browsers. Your browser generally
keeps a record of the websites you have already visited, allowing you to
find the one you want easily.
Advantages
It is much easier to insert, and strings are faster to search than
standard binary trees and hash tables.
Using the node's key, it can give you an entry filter in alphabetical
order.
Disadvantages
More memory is needed for string storage
It isn't as fast as a hash table
Fenwick Tree
Fenwick trees, otherwise known as binary indexed trees (BIT), allow us to
represent arrays of numbers in arrays and efficiently calculate prefix sums.
Take an array of [2, 3, -1, 0,6] for example. The prefix sum is the sum of the
first three elements [2, 3, -1] which is calculated as 2 + 3 + -1 = 4. Efficient
prefix sum calculation is useful in several situations so let's begin with a
simple problem.
Let's say we have an array a[], and we want to perform two different types
of operation on it:
1. Point Update Operation – we want to modify a value that is stored
at index i
2. Range Sum Query – we want the sum of a prefix which is of length
k
Here you can see a simple implementation of this:
int a[] = {2, 1, 4, 6, -1, 5, -32, 0, 1};
void update(int i, int v) //assigns value v to a[i]
{
a[i] = v;
}
int prefixsum(int k) //calculate the sum of all a[i] such that 0 <= i <
k
{
int sum = 0;
for(int i = 0; i < k; i++)
sum += a[i];
return sum;
}
This is a great solution but with a downfall – the time needed for the prefix
sum calculation is in proportion to the array length so, when we perform
intermingled operations with such big numbers, it tends to time out.
Perhaps the most efficient way is with a segment tree as these can do both
operations is O(logN) time.
However, we can also do both operations in O(logN) time with the Fenwick
tree, but there is little sense in learning a new data structure when you
already have the perfect one for it. BIT trees don't take so much space and
are easier to program, given that they don't need any more than 10 lines of
code.
Before we dig deeper into the BIT tree, let's look at a quick bit manipulation
trick.
How to Isolate the Last Bit Set
In this example, number x = 1110 in binary:
Binary Digit 1 1 1 0
Index 3 2 1 0
The final 1 in the binary digit row is the last bit set, which we want to
isolate. But how do we do it?
x & (-x)x& (-x)
This gives us the last bit set in any number, but how?
Let's say that x=a1b in binary, and this number is where we want to isolate
the last bit set from.
a is a binary sequence of 1's and 0's (any length), and b is a sequence of 1s
(any length). Remember, we only want the last set bit so, for that
intermediate 1 bit in between a and b to be that last bit, b must be a
sequence of 0's of length zero or more.
-x = 2’s complement of x = (a1b)’ + 1 = a’0b’ + 1 = a’0(0….0)’ +
1 = a’0(1...1) + 1 = a’1(0…0) = a’1b
a1b – this is x
& a - 1b - this is-x
----------------------
= (0…0) 1 (0…0) - this is the last set bit isolated
Another example is x = 10(in decimal) = 1010 (in binary)
We get the last bit set by
x&(-x) = (10)1(0) & (01)1(0) = 0010 = 2x&(-x) = (10)1(0)
& 01)1(0) = 0010 = 2 (in decimal)
Is there a good reason why we need this last odd bit isolated from any
number? Yes, and as we continue through this section, you will see why.
Now let's look deeper into the BIT tree.
What Is the Idea of a Binary Indexed Tree?
We already know that the sum of powers of two can represent an integer. In
the same way, for an array of size N, an array BIT[] can be maintained so
that the sum of some of the numbers in the specified array may be stored at
any index. This is also known as a partial sum tree.
Here's an example to show you how partial sums are stored in BIT[].
// make sure our given array is 1-based indexed
int a[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};
Courtesy of: https://www.hackerearth.com ›
In the image above, you can see the BIT tree, where the enclosed boxes
indicate the value BIT[index], and the partial sum of some numbers is
stored in each BIT[index].
Note that:
{ a[x], if x is odd
BIT[x] = a[1] + ... + a[x], if x is power of 2
}
The cumulative sum of index i to i-(1<<r)+1 (both inclusive) is stored in
each index i in the BIT[] array, with r representing index i’s last set bit.
sum of the first 12 numbers in this array a[]=BIT[12]+BIT[8]=
(a[12]+…+a[9])+(a[8]+…+a[1])
In the same way
sum of first 6 elements =BIT[6]+BIT[4]=(a[6]+a[5])+(a[4]+…+a[1])
sum of first 8 elements=BIT[8]=a[8]+…+a[1]
Let's construct our tree and then query it for prefix sums.
BIT[] is an array, size = 1 + the given array a[] size, which is where the
operations need to be performed. To start with, all the BIT] values are equal
to 0.
Next, the update() operation is called for each array element, which
constructs the BIT tree. We'll talk about the update operation below.
Let's see how to construct this tree, and then we will come back to querying
the tree for prefix sums. BIT[] is an array of size = 1 + the size of the given
array a[] on which we need to perform operations. Initially, all values
in BIT[] are equal to 0. Then we call update() operation for each element of
the given array to construct the Binary Indexed Tree.
The update() operation is discussed below.
void update(int x, int val) //add "val" at index "x"
{
for(; x <= n; x += x&-x)
BIT[x] += val;
}
Don't worry if you can't figure out this update function. I'll show you an
update to help you:
Let's say we want to call
update(13, 2)
The above figure shows that index 13 is covered by indices 13, 14, and 16,
so we also need to add 2.
x is 13, so BIT[13] is updated:
BIT[13] += 2;
Now the last bit set of x=13(1101) needs to be isolated and then added to x,
for example:
x += x&(-x)x += x&(-x)
The last bit is 1, and this is added to x:
x=3+1=14
And then BIT[14] is updated:
BIT[14] += 2;
14 is now 1110, so we need to isolate the last bit and add it to 14, so x will
then become:
x=14+2=16(10000)
Then, BIT[16] is updated:
BIT[16] += 2;
When we perform an update operation n index x, all the BIT[] indices are
updated, including index x, and BIT[] is maintained.
In the update() operation, you can see a for loop. You can see that it runs the
number of bits included in index x, which, as we know, is restricted to be
equal to or less than N, which is given array size. So, we could say that the
operation would take O(logN) time at the most.
How do we query a structure like this for prefix sums?
Here's the query operation:
int query(int x) //returns the sum of first x elements in given array
a[]
{
int sum = 0;
for(; x > 0; x -= x&-x)
sum += BIT[x];
return sum;
}
This query will return the sum of the first x number of elements in the array.
Here's how it works:
Let's say we call query(14), which is
sum = 0
X is 14(1110), and we add BIT[14] to the sum variable, so
sum=BIT[14]=(a[14]+a[13]
Now, the last set bit is isolated from x=14(1110) and subtracted from x. The
last bit in 12(1100) is 4(100), so
x=4-2=12
BIT[12] is added to the sum variable, so
sum=BIT[14]+BIT[12]=(a[14]+a[13])+(a[12]+…+a[9])
Again, the last bit set is isolated from x=12(1100) and subtracted from x.
the last bit in 12(1100) is 4(100) so
x=12–4=8
BIT[8] is added to the sum variable, so
sum=BIT[14]+BIT[2]+BIT[8]=(a[14]+a[13])+(a[12]+…+a[9])+
(a[8]+…+a[1])
Lastly, the last set bit is isolated from x=8(1000) and subtracted from x. The
last bit in (1000) is 8(000), so
x=8–8=0
The for loop will break because x=0 and the prefix sum is returned.
In terms of time complexity, we can see the loop iterating over the number
of bits in x, which is N at most. So, it's safe to say that the query operation
will take O(logN) time.
Here’s the whole program:
int BIT[1000], a[1000], n;
void update(int x, int val)
{
for(; x <= n; x += x&-x)
BIT[x] += val;
}
int query(int x)
{
int sum = 0;
for(; x > 0; x -= x&-x)
sum += BIT[x];
return sum;
}
int main()
{
scanf(“%d”, &n);
int i;
for(i = 1; i <= n; i++)
{
scanf(“%d”, &a[i]);
update(i, a[i]);
}
printf(“sum of first 10 elements is %d\n”, query(10));
printf(“sum of all elements in range [2, 7] is %d\n”, query(7) –
query(2-1));
return 0;
}
When BIT or Fenwick Trees Should be Used
Before you choose to use this tree for performing operations over range,
you should first discern whether your function or operation
1. Is Associative – i.e. f(f(a,b),c)=f(a,f(b,c)). This applies even for the
segment tree.
2. Has an Inverse – for example:
Addition has an inverse subtraction
Multiplication has an inverse division
gcd() doesn't have an inverse, so BIT cannot be used to
calculate range gcds
The sum of the matrices has an inverse
The product of the matrices would have an inverse if matrices
are given as degenerate, i.e., the matrix determinant does not
equal 0
3. Space Complexity – is O(N) to declare another size N array
4. Time Complexity – is O(logN) for every operation, including query
and update
BIT Applications
There are two primary applications for BIT:
1. They are used for implementing the arithmetic coding algorithm. This
is the use case that primarily motivated by the development of
operations that this supports
2. They are used for counting inversions in arrays in O(NlogN) time
AVL Tree
The AVL tree was first invented in 1962 by GM Adelson-Velsky and EM
Landis and was named after the inventors.
The definition of an AVL tree is a height-balanced binary search tree, where
every node has a balance factor associated with it. This balance factor is
calculated by subtracting the right subtree height from the left subtree
height. Provided each node has a balance factor of -1 to +1, the tree is
balanced. If not, the tree is unbalanced and needs work to balance it.
Balance Factor (k) = height (left(k)) - height (right(k))
If any node has a balance factor of 1, the left subtree is higher than the right
tree by one level.
If any node has a balance factor of 0, the left and right subtrees are of equal
height.
If any node has a balance factor of -1, the left subtree is lower than the right
subtree by one level.
Below you can see an illustration of an AVL tree. Each node has a balance
factor of -1 to +1:
Complexity
Algorithm Average Case Worst Case
Space O(N) O(N)
Search O(logN) O(logN)
Insert O(logN) O(logN)
Delete O(logN O(logN)
AVL Tree Operations
Because an AVL tree is a binary search tree, all the operations are
performed the same way they are on a standard binary search tree. When
you search or traverse an AVL tree, you are not violating the properties but
insert and delete operations can violate them.
Operation Description
Insertion When you perform insertion into an
AVL tree, you do it in the same way
as any other binary search tree.
However, because it can violate the
AVL tree property, the tree will
need to be rebalanced. This is done
by applying rotations, which we
will discuss shortly.
Deletion Deletion is also done the same way
as any other binary search tree, but,
like the insertion operation, it too
can disturb the tree’s balance.
Again, rotation can help rebalance
it.
Why Use an AVL Tree?
AVL trees control the binary search tree height by ensuring it doesn’t get
skewed or unbalanced. All operations done on a binary search tree of height
h take O(h) time. However, should the search tree become skewed, it can
take O(N) time.
With the height limited to Logn, the AVL tree places an upper bound on
every operation, ensuring it is no more than O(logN), where N indicates
how many nodes there are.
Rotations
Rotation is only performed on an AVL tree when the balance factor is not
-1, 0, or +1. There are four rotation types:
1. LL Rotation – the inserted node is in the left subtree of A’s left
subtree
2. RR Rotation – the inserted node is in the right subtree of A’s right
subtree
3. LR Rotation – the inserted node is in the right subtree of A’s left
subtree
4. RL Rotation – the inserted node is in the left subtree of A’s right
subtree.
In this case, A is the node with the balance factor that isn’t between -1 and
+1.
LL and RR rotations are single rotations, while LR and RL are double. A
tree must be a minimum of 2 in height to be considered unbalanced. Let’s
look at each rotation:
1. RR Rotation
A binary search tree becomes unbalanced when a node has been inserted
into the right subtree of A’s right subtree. In this case, we can do RR
rotation. This rotation is anticlockwise and is applied to the edge below the
node with a balance factor of -2:
In this example, node A has a -2 balance factor because Node C has been
inserted into the right subtree of A’s right subtree. The RR rotation is
performed to the edge beneath A.
2. LL Rotation
A binary tree may also become unbalanced when a node has been inserted
into the left subtree of C’s left subtree. In this case, we can do LL rotation.
This rotation is clockwise, and is applied to the edge underneath the node
with a balance factor of 2:
In this example, node C has a 2 balance factor, because node A has been
inserted into the left subtree of C’s left subtree. The LL rotation is
performed to the edge below A.
3. LR Rotation
Double rotations are not so easy as single rotations. To demonstrate, an LR
rotation is a combination of an RR and LL rotation, i.e. the RR rotation is
done on the subtree and the LL rotation on the full tree – the first node from
the inserted node’s path, where the inserted node has a balance factor that is
not between -1 and +1.
4. RL Rotation
The RL rotation is a comination of an LL and an RR rotation. The LL
rotation is done on the subtree and then the RR roation on the full tree, ie.,
the first node from the inserted node’s path, where the inserted node has a
balance factor that is not between -1 and +1.
AVL Tree Implementation
The following is a C++ program to show you all the operations on the AVL
tree:
#include<iostream>
using namespace std;
// Perform rotation
x->right = y;
y->left = T2;
// Update heights
y->depth = max(depth(y->left),
depth(y->right)) + 1;
x->depth = max(depth(x->left),
depth(x->right)) + 1;
// Return new root
return x;
}
// Perform rotation
y->left = x;
x->right = T2;
// Update heights
x->depth = max(depth(x->left),
depth(x->right)) + 1;
y->depth = max(depth(y->left),
depth(y->right)) + 1;
// rotate if unbalanced
// Left Left Case
if (balance > 1 && key < node->left->key)
return rightRotate(node);
return current;
}
// delete a node from AVL tree with the given key
AVLNode* deleteNode(AVLNode* root, int key)
{
if (root == NULL)
return root;
else
{
// node with only one child or no child
if( (root->left == NULL) ||
(root->right == NULL) )
{
AVLNode *temp = root->left ?
root->left :
root->right;
if (temp == NULL)
{
temp = root;
root = NULL;
}
else // One child case
*root = *temp;
free(temp);
}
else
{
AVLNode* temp = minValueNode(root->right);
root->key = temp->key;
// update depth
root->depth = 1 + max(depth(root->left),
depth(root->right));
return 0;
}
Outpu t :
Inorder traversal for the AVL tree is:
4 5 8 11 12 17 18
Inorder traversal after deletion of node 5:
4 8 11 12 17 18
Red Black Tree
Red-black trees are also a form of a self-balancing binary tree, created by
Rudolf Bayer in 1972. In this type of binary tree, each node is given an
extra attribute – a color, red or black. The tree checks the node colors on the
path between the root to a leaf, ensuring that no path is any more than twice
the length of any other. That way, the tree is balanced.
Red-Black Tree Properties
Red-black trees must satisfy several properties:
The root must always be black
NILs are recognized as black – all non-NIL nodes will have two
children.
Red node children are always black
The black height rule dictates that, for a specific node v, there is an
integer bh(v) in such a way that a specific path down to a NIL from v
has the correct bh(v) real nodes. The black height of a red-black tree
is determined as the root's black height.
import java.io.*;
// RedBlackTree class. This class contains the subclass for the node
// and the functionalities of RedBlackTree such as - rotations,
insertion and
// in-order traversal
public class RedBlackTree
{
public Node root;//root node
public RedBlackTree()
{
super();
root = null;
}
// node creating subclass
class Node
{
int data;
Node left;
Node right;
char color;
Node parent;
Node(int data)
{
super();
this.data = data; // only including data. not key
this.left = null; // left subtree
this.right = null; // right subtree
this.color = 'R'; // color . either 'R' or 'B'
this.parent = null; // required at time of rechecking.
}
}
// this function performs left rotation
Node rotateLeft(Node node)
{
Node x = node.right;
Node y = x.left;
x.left = node;
node.right = y;
node.parent = x; // parent resetting is also important.
if(y!=null)
y.parent = node;
return(x);
}
//this function performs right rotation
Node rotateRight(Node node)
{
Node x = node.left;
Node y = x.right;
x.right = node;
node.left = y;
node.parent = x;
if(y!=null)
y.parent = node;
return(x);
}
this.rl = false;
}
else if(this.lr) // for left and then right.
{
root.left = rotateLeft(root.left);
root.left.parent = root;
root = rotateRight(root);
root.color = 'B';
root.right.color = 'R';
this.lr = false;
}
// when rotation and recoloring are finished, the flags are reset.
// Now take care of the RED RED conflict
if(f)
{
if(root.parent.right == root) // to check which child is the
current node of its parent
{
if(root.parent.left==null || root.parent.left.color=='B') //
case when parent's sibling is black
{// perform rotation and recoloring while backtracking, i.e.
setting up respective flags.
if(root.left!=null && root.left.color=='R')
this.rl = true;
else if(root.right!=null && root.right.color=='R')
this.ll = true;
}
else // case when parent's sibling is red
{
root.parent.left.color = 'B';
root.color = 'B';
if(root.parent!=this.root)
root.parent.color = 'R';
}
}
else
{
if(root.parent.right==null || root.parent.right.color=='B')
{
if(root.left!=null && root.left.color=='R')
this.rr = true;
else if(root.right!=null && root.right.color=='R')
this.lr = true;
}
else
{
root.parent.right.color = 'B';
root.color = 'B';
if(root.parent!=this.root)
root.parent.color = 'R';
}
}
f = false;
}
return(root);
}
2
1
Scapegoat Trees
Scapegoat trees are another variant of the self-balancing binary search tree.
However, unlike others, such as the AVL and red-black trees, scapegoats do
not need any extra space per node for storage. They are easy to implement
and have low overhead, making them one of the most attractive data
structures. They are generally used in applications where lookup and insert
operations are dominant, as this is where their efficiency lies.
The idea behind a scapegoat tree is based on something we have all dealt
with at one time or another – when something goes wrong, we look for a
scapegoat to blame it on. Once the scapegoat is identified, we leave them
to deal with what went wrong. After a nide is inserted, the scapegoat will
find a node with an unbalanced subtree – that is your scapegoat. The
subtree will then be rebalanced.
In terms of implementation, a scapegoat tree is flexible. You can optimize
them for insert operations at the expense of a lookup or delete operation, or
vice-versa. The programmer has carte blanche in tailoring the tree to the
application, making these data structures attractive to most programmers.
Finding a scapegoat for insertion is a simple process. Let's say we have a
balanced tree. A new node is inserted, subsequently unbalancing the tree.
Starting at the node we just inserted, we look at the node's subtree root – if
it is balanced, we move on to the parent node and look again. Somewhere
along the line, we will discover a scapegoat with an unbalanced subtree –
this is what must be fixed to rebalance the tree.
Properties
Out of all the binary search trees, the scapegoat is the first to achieve its
complexity of O(\log_2(n))O(log 2 (n)) without needing additional
information stored at every node. This represents huge savings in terms of
space, making it one of the best data structures to use when space is short.
The scapegoat tree stores the following properties for the whole tree:
Property Function
Property Function
Operations
The scapegoat tree handles two operations – insertion and deletion – in a
unique way, while traversal and lookup are done in the same way as any
balanced binary search tree.
Insertion
The insertion process begins in the same way as a binary search tree. A
binary search is used to find the right place for the new node, a process a
time complexity of O(\log_2(n))O(log 2 (n)) time.
Next, the tree must determine if rebalancing is required, so it traverses up
the ancestry for the new node, looking for the first node with an unbalanced
subtree. A factor of \alphaα, which is the tree's weighing property, is
whether the tree is or isn't balanced. For the sake of simplicity, the number
of nodes present in both the right and left subtrees cannot be different by
more than 1. If the tree is unbalanced, it's a surefire thing that one of the
new node's ancestors is unbalanced, common sense considering this is
where the new node was added to the balanced tree. The time complexity to
traverse up the tree to find the scapegoat is O(\log_2(n))O(log 2 (n)), while
the time to actually rebalance a tree that is rooted at the scapegoat is
O(n)O(n).
However, this may not be a reasonable analysis because, although a
scapegoat might be the tree root, it could also be a node deeply buried in the
tree. In that case, it would take much less time because the majority of the
tree is left untouched. As such, we can say that the amortized time for
rebalancing is O(\log_2(n))O(log 2 (n)).
Deletion
Deleting nodes from a scapegoat tree is far easier than inserting them. The
tree uses the max_size property, indicating the maximum size the tree
achieved since it was last fully rebalanced. Whenever a full rebalance is
carried out, the scapegoat tree will set max_size to size.
When a node is deleted, the tree looks at the size property to see if it is
equal to or less than max_size. If yes, the tree will rebalance itself from the
root, taking O(n)O(n) time. However, the amortized time is similar to
insertion, O(\log_2(n))O(log 2 (n)).
Complexity
Scapegoat tree complexity is similar to other binary search trees
Average
Space** O(n)O(n)
Search O(\log_2(n))O(log 2
(n))
Traversal *O(n)O(n)
Insert *O(\log_2(n))O(log
2 (n))
Delete *O(\log_2(n))O(log
2 (n))
*amortized analysis
** better than other self-balancing binary search trees
Python Implementation
Below is an implementation of the scapegoat tree in Python, showing you
the insert operation. Note that all the nodes are pointing to their parent
nodes as a way to make the code easier to read. However, you can omit this
by tracking every parent node as you traverse down the tree for stack
insertion.
The code includes two classes describing the Scapegoat tree and the Node,
with their operations:
import math
class Node:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
self.parent = None
class Scapegoat:
def __init__(self):
self.root = None
self.size = 0
self.maxSize = 0
nodes = []
flatten(root, nodes)
return buildTreeFromSortedList(nodes, 0, len(nodes)-1)
// If the key is at the root and both the left and right are not NULL
else if (root->left->priority < root->right->priority)
{
root = leftRotate(root);
root->left = deleteNode(root->left, key);
}
else
{
root = rightRotate(root);
root->right = deleteNode(root->right, key);
}
return root;
}
Here is the complete program in C++ to show all the operations and the output:
// A Treap Node
struct TreapNode
{
int key, priority;
TreapNode *left, *right;
};
// Perform rotation
x->right = y;
y->left = T2;
// If the key is at the root and both the left and right are not NULL
else if (root->left->priority < root->right->priority)
{
root = leftRotate(root);
root->left = deleteNode(root->left, key);
}
else
{
root = rightRotate(root);
root->right = deleteNode(root->right, key);
}
return root;
}
Delete 20
Inorder traversal of the newly modified tree
key: 30 | priority: %d 48 | right child: 40
key: 40 | priority: %d 21
key: 50 | priority: %d 73 | left child: 30 | right child: 60
key: 60 | priority: %d 55 | right child: 70
key: 70 | priority: %d 50 | right child: 80
key: 80 | priority: %d 44
Delete 30
Inorder traversal of the newly modified tree
key: 40 | priority: %d 21
key: 50 | priority: %d 73 | left child: 40 | right child: 60
key: 60 | priority: %d 55 | right child: 70
key: 70 | priority: %d 50 | right child: 80
key: 80 | priority: %d 44
Delete 50
Inorder traversal of the newly modified tree
key: 40 | priority: %d 21
key: 60 | priority: %d 55 | left child: 40 | right child: 70
key: 70 | priority: %d 50 | right child: 80
key: 80 | priority: %d 44
50 Not Found
An Explanation of the Output:
All the nodes are written as key(priority), and the code above will give us this tree:
20(92)
\
50(73)
/ \
30(48) 60(55)
\ \
40(21) 70(50)
\
80(44)
After deleteNode(20)
50(73)
/ \
30(48) 60(55)
\ \
40(21) 70(50)
\
80(44)
After deleteNode(30)
50(73)
/ \
40(21) 60(55)
\
70(50)
\
80(44)
After deleteNode(50)
60(55)
/ \
40(21) 70(50)
\
80(44)
N-ary Tree
All images courtesy of https://studytonight.com
The N-ary tree is so-named because a node can have n number of children.
This makes it one of the more complex types of binary search trees than
standard search trees, which can only have up to two children for a node.
Here's what a N-ary tree looks like:
Here, we see that the tree has 11 nodes and some of them have three
children, while others only have one. With binary trees, these children
nodes are easier to store as two nodes can be assigned – left and right – to a
node, making each child. With a N-ary tree, it isn't so easy. For a node's
children to be stored, we need to use a different data structure – in Java, it is
the LinkedList, and in C++, it is the vector.
Implementation
With a non-linear data structure, we first need to create a structure (in Java,
this would be a constructor) for the data structure. As with a binary search
tree, we can use the TreeNode class and create the constructors inside it
with class-level variables.
Have a look at this example:
public static class TreeNode{
int val;
List<TreeNode> children = new LinkedList<>();
TreeNode(int data){
val = data;
}
TreeNode(int data){
val = data;
}
Here, you can see that the nodes satisfy the property by having 0 or 4
children.
2. Complete N-ary Tree
This type of N-ary tree is one where each level's nodes should have N
children exactly, thus being complete. The exception is the last level nodes
– if these are not complete, they should be "as left as possible."
Here's a representation:
return root
end function
One-pass Find algorithms were also developed to retain the worst-case
complexity but are inherently more efficient. These are known as path
halving and path splitting, and both will update the parent pointers for those
nodes found on the path from the query to the root node. Path splitting will
replace all of that path's parent pointers with a pointer to the grandparent for
the specific node:
function Find(x) is
while x.parent ≠ x do
(x, x.parent) := (x.parent, x.parent.parent)
end while
return x
end function
While path halving works in much the same way, it only replaces every
alternative parent pointer:
function Find(x) is
while x.parent ≠ x do
x.parent := x.parent.parent
x := x.parent
end while
return x
end function
Merging Two Sets
The Union(x, y) operation replaces two sets – one with x and one with y –
with a union. This operation used Find to work out the roots in the trees that
contain x and y. If they are the same roots, that's all there is to do. However,
if the roots are not the same, the next step is to merge the trees. We do this
in one of two ways – setting the x root's pointer parent to y's or vice versa.
However, your choice of which node will become the parent may have
consequences for future tree operation complexity. Chosen carelessly, a tree
can become too tall. For example, let's assume that the tree with x is always
made a subtree of the one containing y by the Union operation. We start
with a forest initialized with its elements and then execute the following:
Union(1, 2), Union(2, 3), …, Union(n -1, n)
The result is a forest with just one tree, and that tree has a root of n. The
path between 1 and n goes through all the nodes in the tree. The time taken
to run Find(1) would be O(n).
If your implementation is efficient, Union by Rank or Union by Size are
used to control the tree height. Both require that nodes store both a parent
pointer and additional information, and this information determines the root
that will become the parent. Both of these ensure that we do not end up with
an excessively deep tree.
With Union by Size, the node's size is stored in the node. This is nothing
more than a number representing how many descendants there are,
including the node. When the trees with the x and y roots are merged, the
parent node is the one with the most descendants. If both nodes have an
identical number of descendants, either can be the parent, but, in both cases,
the parent node is set to the number representing the total descendants:
function Union(x, y) is
// Replace nodes by roots
x := Find(x)
y := Find(y)
if x = y then
return // x and y are already in the same set
end if
if x = y then
return // x and y are in the same set
end if
When you use Union by Rank, but the parent pointers are not updated
during the Find operation, the running time of θ(m log n) for any type of m
operations up to n (MakeSet operations) to θ(mα(n)). This ensures each
operation has an amortized running time of θ(α(n)), asymptotically optimal,
ensuring every disjoint-set structure uses an amortized time per operation of
Ω(α(n)).
In this, the inverse Ackerman function is α(n), an extraordinarily slow
grower, so the factor for any n written in the physical universe is 4 or lower,
giving all disjoint-set operations an amortized constant time.
Part 4: Advanced Heaps and Priority Queues
Binary Heap or Binary Search Tree for Priority Queues?
Binary heaps are always the preferred option for priority queues over binary
search trees. For a priority queue to be efficient, the following operations
are required:
1. Get the minimum or maximum to determine the top priority element
2. Insert an element
3. Eliminate the top priority element
4. Decrease the key
Binary heaps support these operations with the following respective time
complexities:
1. O(1)
2. O(Logn)
3. O(Logn)
4. O(Logn)
The red-black, AVL, and other self-balancing binary search trees also
support these operations, sharing the same time complexities.
Let's break down those operations:
1. O(1) is not naturally the time complexity for finding the minimum
and maximum. However, by retaining an additional pointer to the min
or max and updating it with deletion or insertion as needed, it can be
implemented within that time complexity. When we use deletion, the
update is done by finding the inorder successor or predecessor.
2. O(Logn) is the natural time complexity for inserting elements.
3. O(Logn) is also natural for removing the minimum and maximum.
4. We can decrease the key in O(Logn) with a sequence of a deletion
and an insertion.
But how does this answer our question – why are binary heaps preferred for
the priority queues?
Because arrays are used to implement binary heaps, the locality of
reference is always better, and we get more cache-friendly operations.
Although the operations share the same time complexities, binary
search tree constants are always higher.
A binary heap can be built in O(n) time, while a self-balancing binary
search tree needs O(nLogn) time.
Binary heaps don't need additional space for the pointers like binary
search trees do.
Binary heaps are much easier to implement than binary search trees.
Binary heaps come in several variations, such as the Fibonacci Heap
that supports insert and decrease-key in θ(1) time.
But are binary heaps always better than a binary search tree?
Sure, binary heaps are better for priority queues, but binary search trees
have a much bigger list of advantages in other ways:
It takes O(Logn) to search for an element in a self-balancing binary
search tree, while the same in a binary heap takes O(n) time.
All the elements in a binary search tree can be printed in sorted order
in O(n) time, while the binary heap takes (O(nLogn) time.
Adding an additional field to the binary search tree makes the kth
smallest or largest element take O(Logn) time.
Now we've had a quick look at why binary heaps are better for priority
queues than the binary search tree, it's time to dig into some of the different
types.
Binomial Heap
Binary heaps have one primary application – to implement the priority
queues. The binomial heap is an extension of the binary heap, providing
much faster merge or union operations in addition to all the other operations
the binary heap offers.
What Is a Binomial Heap?
Let's say you have a binomial tree with an order of 0 – it would have a
single node. Constructing a binomial tree with an order k requires two
binomial trees with an order k-1 and setting one of them as the left child.
This tree with an order of k has these properties:
It has 2 k nodes exactly
Its depth is k
At depth i for i – 0, 1, …, k, it has k c i nodes exactly
The root of the tree has a degree of k, and the root's children are
binomial trees with the following order:
k = 2 (4 nodes)
[Take two k = 1 order Binomial Trees, and
set one of them as a child of the other]
o
/ \
o o
/
o
k = 3 (8 nodes)
[Take two k = 2 order Binomial Trees, and
set one of them as a child of the other]
o
/ |\
o oo
/\ |
o oo
\
o
The Binomial Heap
Binomial heaps are nothing more than a series of binomial trees. Each tree
will follow the Min Heap property, and there can only be one tree t most of
any particular degree. Here are some examples:
12------------10--------------------20
/ \ / |\
15 50 70 50 40
| /| |
30 80 85 65
|
100
This binomial heap contains 13 nodes and is a series of three trees with
orders from left to right of 0, 2, and 3.
10--------------------20
/ \ / |\
15 50 70 50 40
| /| |
30 80 85 65
|
100
This binomial heap contains 12 nodes and is a series of two trees with
orders from left to right of 2 and 3.
Representing Numbers and Binomial Heaps in Binary
Let's say you have a binomial heap with n nodes. It will contain binomial
trees that equal the number of bits in n's binary representation. For example,
we'll say that n is 13 and the binary representation of n, which is 00001101,
has three set bits. That means there are three binomial trees. The degrees of
these trees can also be related to the positions of the set bits, and, using that
relationship, we conclude that a binomial heap with n nodes has O(Logn)
binomial trees.
Binomial Heap Operations
The binomial heap's primary function is union(), and all other operations
also use this operation. We use the union() operation to combine two heaps
into a single one. We'll discuss union later; first, the other operations in the
binomial heap:
1. insert(H, k) – this operation inserts key 'k' into the binomial heap,
'H,' creating a binomial heap that has a single key. Next, union is
called on H and the newly created binomial heap.
2. getMin(H) – the easiest way of obtaining getMin() is by traversing
the root list of the trees and returning the smallest or minimum key.
Implementing this requires O(Logn) time, but we can optimize this to
O(1) by ensuring a pointer to the minimum key root is maintained.
3. extractMin(H) – extractMin() also uses the union() operation. First,
getMin() is called to obtain the minimum key tree. Then the node is
removed, and a new binomial heap is created by all the subtrees from
the removed node being connected. Lastly, union() is called on H and
the new heap. This entire operation needs O(Logn) time.
4. delete(H) – similar to the binary heap, the delete operation takes two
steps – the key is reduced to minis infinite, and then extractMin() is
called.
5. decreaseKey(H) – this is also like the binary heap. The decrease key
is compared with its parent. Should the parent key be greater, the keys
are swapped and recur for the parent key. The operation stops when it
reaches a node where the parent ey is smaller, or it reaches the root
node. decreaseKey() has a time complexity of O(Logn).
Union Operation
Let's say we have a pair of binomial heaps, H2 and H2. The union()
operation will create one binomial heap using the following steps:
1. Step one is merging the heaps by a non-decreasing order of the heap
degrees.
2. Once this is done, it's time to ensure there is no more than one
binomial tree of any order. This requires binomial trees sharing the
same order to be combined. The list of the merged roots is traversed,
and we track the three pointers called prev, x, and next-x. There are
four cases where the list of roots is traversed:
i. Where the x and next-x orders are not the same, we do nothing
more than move in
ii. Where the next-next-x order is the same, move on
iii. If x's key is equal to or less than the next-x key, next-x is made
into x's child – the two are linked together to achieve this.
iv. If x's key is greater, x is made next's child.
Binomial Heap Implementation
The following implementation in C++ includes the insert(), getMin(),
andextractMin() operations:
// C++ program to implement some operations
// on Binomial Heap
#include<bits/stdc++.h>
using namespace std;
return b1;
}
if (_heap.size() == 2)
{
it2 = it1;
it2++;
it3 = _heap.end();
}
else
{
it2++;
it3=it2;
it3++;
}
while (it1 != _heap.end())
{
// if only a single element remains to be processed
if (it2 == _heap.end())
it1++;
// if the degrees of two Binomial Trees are the same in the heap
else if ((*it1)->degree == (*it2)->degree)
{
Node *temp;
*it1 = mergeBinomialTrees(*it1,*it2);
it2 = _heap.erase(it2);
if(it3 != _heap.end())
it3++;
}
}
return _heap;
}
return adjust(temp);
}
return 0;
}
Output:
The heap is:
50 10 30 40 20
After deleing 10, the heap is:
20 30 40 50
Fibonacci Heap
As you know, the primary use for a heap is to implement a priority queue,
and the Fibonacci heap beats the binomial and binary heaps when it comes
to time complexity. Below, you can see the Fibonacci heap's amortized time
complexities:
1. Find Min - Θ(1), which is the same as the binomial and binary heaps
2. Delete Min - O(Logn), which is Θ(Logn) in the binomial and binary
heaps
3. Insert - Θ(1), which is Θ(Logn) in the binary heap and Θ(1) in the
binomial heap
4. Decrease-Key - Θ(1), which is Θ(Logn) in the binomial and binary
heaps
5. Merge - Θ(1), which is Θ(m Logn) or Θ(m+n) in the binary heap and
Θ(Logn) in the binomial heap
The Fibonacci heap is like the binomial heap in that it is a series of trees
that have the max-heap or min-heap property. In Fibonacci, the trees can be
of any shape and can even all be single nodes, as opposed to the binomial
heap where all the trees must be binomial.
A pointer to the minimum value (tree root) is also maintained in the
Fibonacci heap, and a doubly-linked list is used to connect all the trees.
That way, a single pointer to min can be used to access all of them.
The primary idea is to use a "lazy" way of executing operations. For
example, the merge operation does nothing more than link two heaps, while
the insert operation is no more complicated than adding a new tree with just
a single node. The most complicated is the extract minimum operation, as
it is used to consolidate trees. This results in the delete operation also being
somewhat complicated as it must first decrease the key to minimum infinite
before calling the extract minimum operation.
Interesting Points
1. The Decrease-Key operation has a reduced time complexity, which is
important to the Prim and Djikstra algorithms. If you use a binary
heap, these algorithms have a time complexity of O(VLogV +
ELogV), but if you use the Fibonacci heap, that improves to time
complexity of O(VLogV + E).
2. While the time complexity looks promising for the Fibonacci heap, it
has proved to be somewhat slow in real-world practice because of
high hidden constants.
3. Fibonacci heaps gain their name primarily from the fact that the
running time analysis uses Fibonacci numbers. Added to that is the
fact that all the nodes in a Fibonacci heap have a maximum O(Logn)
degree. And, when a subtree has its root in a node of degree k, its size
is a minimum of F k+2 , and the kth Fibonacci number is F k .
Insertion and Union
Here, we are going to discuss two of the Fibonacci heap operations, starting
with insertion.
Insertion
We use the algorithm below to insert a new node into a Fibonacci heap:
1. A new node, x, is created
2. Heap H is checked to see if it is empty or not
3. If it is, x is set as the single node in the root list, and the H(min)
pointer is set to x
4. If the heap isn't empty, x is inserted into the root list, and H(min) is
updated.
Union
We can do a union of H1 and H2, both Fibonacci heaps, using the following
algorithm:
1. The root lists of H1 and H2 are joined into one Fibonacci heap H
2. If H1(min) < H2(min) then H(min) = H2(min)
3. If not, H(min) = H1(min)
The program below uses C++ to build a Fibonacci heap and insert into it:
// C++ program to demonstrate building
// and a Fibonacci heap and inserting into it
#include <cstdlib>
#include <iostream>
#include <malloc.h>
using namespace std;
struct node {
node* parent;
node* child;
node* left;
node* right;
int key;
};
else {
cout << "The root nodes of Heap are: " << endl;
do {
cout << ptr->key;
ptr = ptr->right;
if (ptr != mini) {
cout << "-->";
}
} while (ptr != mini && ptr->right != NULL);
cout << endl
<< "The heap has " << no_of_nodes << " nodes" << endl;
}
}
// Function to find the min node in the heap
void find_min(struct node* mini)
{
cout << "min of heap is: " << mini->key << endl;
}
// Driver code
int main()
{
no_of_nodes = 7;
insertion(4);
insertion(3);
insertion(7);
insertion(5);
insertion(2);
insertion(1);
insertion(10);
display(mini);
find_min(mini);
return 0;
}
Output:
The root nodes of Heap are:
1-->2-->3-->4-->7-->5-->10
The heap has 7 nodes
Min of heap is: 1
Fibonacci Heap – Deletion, Extract min and Decrease key
We've just looked at insertion and union, both operations being critical to
understanding the next three operations. We’ll start with Extract_min().
Extract_min()
In this operation, a function is created to delete the minimum node. Then
the min pointer is set to the minimum value in the rest of the heap. The
algorithm below is used:
1. The min node is deleted
2. The head is set to the next min node. Then, all the trees from the
deleted node are added to the root list
3. An array is created, containing degree pointers with the same size as
the deleted node
4. The degree pointer is set to the current node
5. Then we move to the next node. If the degrees are the same, the union
operation is used to join the trees. If they are different, the degree
pointer is set to the next node.
6. Steps four and five are repeated until the heap is finished.
Decrease_key()
If we want the value of any heap element decreased, the following
algorithm is followed:
1. Node x's value is decreased to the new specified value
2. Case 1 – should this result in the min-heap property not being
violated, the min pointer is updated if needed
3. Case 2 – should this result in the min-heap property being violated
and x's parent is not marked, we need to do three things:
a. The link between x and the parent is cut off
b. X's parent is marked
c. The tree rooted at x is added to the root list and, if needed, the
min pointer is updated
4. Case 3 – if there is a violation of the min-heap property and x's parent
is already marked:
a. The link is cut off between x and p[x] – the parent
b. x is added to the root list, and the min pointer is updated if
needed
c. The link is cut off between p[x] and p[p[x]]
d. p[x] is added to the root list, and the min pointer is updated if
needed
e. p[p[x]] is marked if it is unmarked
f. Otherwise, p[p[x]] is cut off, and steps 4b to 4e are repeated,
using p[p[x] as x.
Deletion()
The algorithm below is followed to delete an element from the heap:
1. The value of the node (x) you want to delete is decreased to a
minimum using the Decrease_key() function
2. The heap with x is then heapified using the min-heap property,
putting x in the root list
3. Extract_min() is applied to the heap
Below is a program in C++ to demonstrate these operations:
// C++ program to demonstrate the Extract min, Deletion()
// and Decrease key() operations
#include <cmath>
#include <cstdlib>
#include <iostream>
#include <malloc.h>
using namespace std;
x = temp->child;
do {
pntr = x->right;
(mini->left)->right = x;
x->right = mini;
x->left = mini->left;
mini->left = x;
if (x->key < mini->key)
mini = x;
x->parent = NULL;
x = pntr;
} while (pntr != temp->child);
}
(temp->left)->right = temp->right;
(temp->right)->left = temp->left;
mini = temp->right;
if (temp == temp->right && temp->child == NULL)
mini = NULL;
else {
mini = temp->right;
Consolidate();
}
no_of_nodes--;
}
}
temp->degree = temp->degree - 1;
found->right = found;
found->left = found;
(mini->left)->right = found;
found->right = mini;
found->left = mini->left;
mini->left = found;
found->parent = NULL;
found->mark = 'B';
}
if (found == NULL)
cout << "Node not found in the Heap" << endl;
found->key = val;
else {
cout << "The root nodes of Heap are: " << endl;
do {
cout << ptr->key;
ptr = ptr->right;
if (ptr != mini) {
cout << "-->";
}
} while (ptr != mini && ptr->right != NULL);
cout << endl
<< "The heap has " << no_of_nodes << " nodes" << endl
<< endl;
}
}
// Driver code
int main()
{
// A heap is created and 3 nodes inserted into it
cout << "Create the initial heap" << endl;
insertion(5);
insertion(2);
insertion(8);
return 0;
}
Output:
Creating an initial heap
The root nodes of Heap are:
2-->5-->8
The heap has 3 nodes
Extracting min
The root nodes of Heap are:
5
The heap has 2 nodes
Decrease value of 8 to 7
The root nodes of Heap are:
5
The heap has 2 nodes
When you merge these into one leftist heap, the leftist heap property is
violated by the subtree at node 7. In that case, it is swapped with the left
child, and the leftist heap property is retained.
Next, it is converted into a leftist heap, and the process is repeated.
This algorithm's worst-case time complexity is O)Logn), where n indicates
how many nodes are in the leftist heap.
Below is a C++ implementation of a leftist heap:
//C++ program for leftist heap / leftist tree
#include <bits/stdc++.h>
using namespace std;
//Class Declaration
class LeftistHeap
{
public:
LeftistHeap();
LeftistHeap(LeftistHeap &rhs);
~LeftistHeap();
bool isEmpty();
bool isFull();
int &findMin();
void Insert(int &x);
void deleteMin();
void deleteMin(int &minItem);
void makeEmpty();
void Merge(LeftistHeap &rhs);
LeftistHeap & operator =(LeftistHeap &rhs);
private:
LeftistNode *root;
LeftistNode *Merge(LeftistNode *h1,
LeftistNode *h2);
LeftistNode *Merge1(LeftistNode *h1,
LeftistNode *h2);
void swapChildren(LeftistNode * t);
void reclaimMemory(LeftistNode * t);
LeftistNode *clone(LeftistNode *t);
};
// Copy constructor.
LeftistHeap::LeftistHeap(LeftistHeap &rhs)
{
root = NULL;
*this = rhs;
}
// Deep copy
LeftistHeap &LeftistHeap::operator =(LeftistHeap & rhs)
{
if (this != &rhs)
{
makeEmpty();
root = clone(rhs.root);
}
return *this;
}
h.Insert(arr[0]);
h.Insert(arr[1]);
h.Insert(arr[2]);
h.Insert(arr[3]);
h.Insert(arr[4]);
h1.Insert(arr1[0]);
h1.Insert(arr1[1]);
h.deleteMin(x);
cout<< x <<endl;
h1.deleteMin(x);
cout<< x <<endl;
h.Merge(h1);
h2 = h;
h2.deleteMin(x);
cout<< x << endl;
return 0;
}
Output:
1
22
5
K-ary Heap
The k-ary heap is a binary heap generalization (k=2) where every node has
k children rather than just 2. As with the binary heap, the k-ary heap has
two properties:
1. A k-ary heap is an almost complete binary tree, where all the levels
have the maximum number of nodes, except for the final one – this is
filled from left to right.
2. Like the binary heap, we can divide a k-ary heap into two categories:
a. Max k-ary heap – the key at the root is more than all the
descendants. The same is true recursively for every node
b. Min k-ary heap – the key at the root is less than the
descendants. The same is true recursively for every node
Examples:
In a 3-ary max-heap, the maximum of all the nodes is the root node:
10
/ | \
7 9 8
/|\ /
4 6 57
In a 3-ary min-heap, the minimum of all the nodes is the root node:
10
/ | \
12 11 13
/|\
14 15 18
A complete k-ary tree containing n nodes has a height given by logkn.
K-Ary Heap Applications
1. When using it in a priority queue implementation, the k-ary heap has
a faster decrease_key() operation than the binary heap. The binary
heap time complexity is O(Log2n), where the k-ary heap is
O(Logkn). However, it does cause the extractMin() operation's time
complexity to increase. The binary heap does it in time complexity of
O(Log2n) while the k-ary heap's complexity is O(k log ln). This does
mean the k-ary heap is efficient where the decrease priority
operations in an algorithm are more common than the extractMin()
operation. An example of this is the Prim and Djikstra algorithms.
2. The memory cache behavior in the k-ary heap is much better than in a
binary heap. This means that, in practice, the k-ary heap runs faster,
but their worst-case running times for the delete() and extractMin()
operations are much larger, with both of them being O(K Log kn)
Implementation
Assuming we have an array with O-based indexing, the k-ary heap is
represented by the array in such a way that the following are considered for
any node:
For the node at index i (unless it is a root node), the parent is at index
(i-1)/k
For the node at index i, the children are located at the (k*i)+1,
(k*i)+2, …, (k*i)+k indices
In a heap of size n, the last non-leaf node is at (n-2)/k
buildHeap() :
This function takes an input array and builds a heap. Starting from the last
non-leaf node, the function runs a loop to the root node. The restoreDown
function, also called maHeapify, is called for every index where the passed
index is stored in the right place in the heap. The node is moved down the
heap, building the heap from the bottom up.
So, why does the loop need to start from the last non-leaf node?
The simple answer is every node that comes after that is a leaf node.
Because they have no children, they satisfy the heap property trivially and,
as such, are already max-heap roots.
restoreDown() (or maxHeapify) :
This function maintains the heap property, running a loop through the
node's children to find the maximum. It then compares the maximum with
its own value, swapping where max(value of all the children) > (the node's
value). This is repeated until the node is back to its original position within
the k-ary heap.
extractMax() :
This function extracts the root node. In a k-ary heap, the largest element is
stored in the heap's root. The root node is returned, the last node copied to
the first, and restoreDown is called on the first node, ensuring the heap's
property is maintained.
insert() :
This function is used to insert a node into the k-ary heap. The node is
inserted at the last position, and restoreUp() is called on the given index,
putting the node back to its rightful position in the k-ary heap. The
restoreUp() function compares a specified node iteratively with its parent
because the parent in a max heap will always be equal to or greater than its
children nodes. The node and parent can only be swapped when the node's
key is greater than the parent.
In the following C++ implementation, we put all of this together:
// C++ program to demonstrate all the operations of
// k-ary Heap
#include<bits/stdc++.h>
while (1)
{
// child[i]=-1 if the node is a leaf
// children (no children)
for (int i=1; i<=k; i++)
child[i] = ((k*index + i) < len) ?
(k*index + i) : -1;
// leaf node
if (max_child == -1)
break;
index = max_child_index;
}
}
// Loop should run only until the root node in case the
// element inserted is the maximum. restoreUp will
// send it to the root node
while (parent>=0)
{
if (arr[index] > arr[parent])
{
swap(arr[index], arr[parent]);
index = parent;
parent = (index -1)/k;
}
return max;
}
// Driver program
int main()
{
const int capacity = 100;
int arr[capacity] = {4, 5, 6, 7, 8, 9, 10};
int n = 7;
int k = 3;
buildHeap(arr, n, k);
int element = 3;
insert(arr, &n, k, element);
return 0;
}
Output
Built Heap :
10 9 6 7 8 4 5
// Driver code
int main()
{
int arr[] = { 12, 11, 13, 5, 6, 7 };
int n = sizeof(arr) / sizeof(arr[0]);
heapSort(arr, n);
do
{
index = (2 * j + 1);
j = index;
printf("\n\n");
heapSort(arr, n);
// print array after sorting
printf("Sorted array: ");
for (int i = 0; i < n; i++)
printf("%d ", arr[i]);
return 0;
}
Output:
Given array: 10 20 15 17 9 21
Sorted array: 9 10 15 17 20 21
Time Complexity
The heapSort and buildMaxHeap functions both run in O(nLogn) time, so
O(nLogn) is the overall time complexity.
Conclusion
Are you wondering why you should bother to study advanced data
structures? I mean, what use do they really have in the real world? When
you go for a data science or data engineering job interview, why do the
interviewers insist on asking questions about them?
Many programmers, beginners, and experienced ones alike choose to avoid
learning about algorithms and data structures. This is partly because they
are quite complicated and partly because they don’t see the need for them in
real life.
However, learning data structures and algorithms allows you to see real-
world problems and their solutions in terms of code. They teach you how to
take a problem and break it down into several smaller pieces. You think
about how each piece fits together to provide the final solution and can
easily see what you need to add along the way to make it more efficient.
This guide has taken you through some of the more advanced data
structures used in algorithms and given you some idea of how they work
and where they will work the best. It is an advanced guide, and you are not
expected to understand everything right away. However, you should use this
guide as your go-to when you want to double-check that a data structure
will fit your situation or when you want to see if there is a better way of
doing something.
Thank you for taking the time to read it. I hope you found it helpful and
now feel you can move on in your data journey.
References
“15-121 Priority Queues and Heaps.” Www.cs.cmu.edu,
www.cs.cmu.edu/~rdriley/121/notes/heaps.html.
“Advanced Data Structures.” GeeksforGeeks, www.geeksforgeeks.org/advanced-data-structures/.
“Bloom Filters - Introduction and Python Implementation.” GeeksforGeeks, 17 Apr. 2017,
www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/.
“DAA Red Black Tree - Javatpoint.” Www.javatpoint.com, www.javatpoint.com/daa-red-black-tree.
“Different Self Balancing Binary Trees.” OpenGenus IQ: Computing Expertise & Legacy, 20 Aug.
2020, iq.opengenus.org/different-self-balancing-binary-trees/.
“Disjoint Set Data Structure - Javatpoint.” Www.javatpoint.com, www.javatpoint.com/disjoint-set-
data-structure.
“Disjoint Sets - Disjoint Sets Union, Pairwise Disjoint Sets.” Cuemath,
www.cuemath.com/algebra/disjoint-sets/.
“Disjoint Sets - Explanation and Examples.” The Story of Mathematics - a History of Mathematical
Thought from Ancient Times to the Modern Day, www.storyofmathematics.com/disjoint-sets.
“Doubly Linked List - Javatpoint.” Www.javatpoint.com, www.javatpoint.com/doubly-linked-list.
“Fenwick Tree - Competitive Programming Algorithms.” Cp-Algorithms.com, cp-
algorithms.com/data_structures/fenwick.html.
“N-Ary Tree Data Structure - Studytonight.” Www.studytonight.com,
www.studytonight.com/advanced-data-structures/nary-tree.
“Persistent Data Structures.” GeeksforGeeks, 27 Mar. 2016, www.geeksforgeeks.org/persistent-data-
structures/.
“Segment Trees Tutorials & Notes | Data Structures.” HackerEarth,
www.hackerearth.com/practice/data-structures/advanced-data-structures/segment-trees/tutorial/.
“Self-Balancing-Binary-Search-Trees (Comparisons).” GeeksforGeeks, 6 June 2018,
www.geeksforgeeks.org/self-balancing-binary-search-trees-comparisons/.
Talbot, Jamie. “What Are Bloom Filters?” 3 Min Read, 3 min read, 15 July 2015,
blog.medium.com/what-are-bloom-filters-1ec2a50c68ff.
“Treap (a Randomized Binary Search Tree).” GeeksforGeeks, 22 Oct. 2015,
www.geeksforgeeks.org/treap-a-randomized-binary-search-tree/.
“Trie Data Structure - Javatpoint.” Www.javatpoint.com, www.javatpoint.com/trie-data-structure.
“Unrolled Linked List | Brilliant Math & Science Wiki.” Brilliant.org, brilliant.org/wiki/unrolled-
linked-list/.
“Why Data Structures and Algorithms Are Important to Learn?” GeeksforGeeks, 13 Dec. 2019,
www.geeksforgeeks.org/why-data-structures-and-algorithms-are-important-to-learn/