Introduction to Data Structures Guide
Introduction to Data Structures Guide
Introduction
• Data structure usually refers to a data organization, management, and storage in main
memory that enables e ciently access and modi cation.
• If data is arranged systematically then it gets the structure and becomes meaningful. This
meaningful and processed data is the information.
• The cost of a solution is the amount of resources that the solution needs.
• A data structure requires:
◦ Space for each data item it stores
◦ Time to perform each basic operation
◦ Programming e ort
• How to select a data structure?
◦ Identify the problem
◦ Analyze the problem
◦ Quantify the resources
◦ Select the data structure
• [Link] [Link]
• Two important things about data types:
◦ De nes a certain domain of values
◦ De nes operations allowed on those values
◦ Example: int takes
▪ Takes only integer values
▪ Operations: addition, subtraction, multiplication, division, bitwise operations.
• ADT describes a set of objects sharing the same properties and behaviors.
◦ The properties of an ADT are its data.
◦ The behaviors of an ADT are its operations or functions.
• ADT example: stack (can be implemented with array or linked list)
• Abstraction is the method of hiding unwanted information.
• Encapsulation is a method to hide the data in a single entity or unit along with a method to
protect information from outside. Encapsulation can be implemented using an access
modi er i.e. private, protected, and public.
What is the data structure
• A data structure is the organization of the data in a way so that it can be used e ciently.
• It is used to implement an ADT.
• ADT tells us what is to be done and data structures tell use how to do it.
• Types:
◦ linear (stack, array, linked list)
◦ non-linear (tree, graph)
◦ static (compile time memory allocation), array
▪ Advantage: fast access
▪ Disadvantage: slow insertion and deletion
◦ dynamic (run-time memory allocation), linked list
▪ Advantage: faster insertion and deletion
▪ Disadvantage: slow access
Asymptotic notations
• O (Big-O) notation (worst time, upper bound, maximum complexity), 0 <= f(n) <=
c*g(n) for all n >= n0, f(n) = O(g(n))
f(n) = 3n + 2, g(n) = n, f(n) = Og(n)
•
• 3n + 2 <= Cn
• 3n + 2 <= 4n
• n >= 2
•
• c = 4, n >= 2
•
◦ n3 = O(n2) False
◦ n2 = O(n3) True
• Ω (Omega) notation (best amount of time, lower bound), 0 <= c*g(n) <= f(n) for all
n >=n0
f(n) = 3n + 2, g(n) = n, f(n) = Ωg(n)
•
• 3n + 2 <= Cn
• 3n + 2 <= n
• 2n >= -2
• n >= -1
•
• c = 1, n >= 1
•
• Θ (Big-theta) notation (average case, lower & upper sandwich), 0 <= c1*g(n) <=
f(n) <= c2*g(n)
f(n) = 3n + 2, g(n) = n, f(n) = Θg(n)
•
• C1*n <= 3n + 2 <= C2*n
•
• 3n + 2 <= C2*n c1*n <= 3n + 2
• 3n + 2 <= 4n 3n + 2 >= n
• n >= 2 n >= -1
•
• c2 = 4, n >= 2 c1 = 1, n >= 1
• n >=2 // We must take a greater number, which is true for both
Searching Techniques
• Analysis:
◦ Best-case O(1)
◦ Average O(log n)
◦ Worst-case O(log n)
Sorting techniques
• Python code The problem is divided into two sub-problems. Each problem is solved
individually. Finally, sub-problems are combined to the final solution.
• Divide: we split A[p..r] into two arrays A[p..q] and A[q+1, r]
• Conquer: we sort both sub-arrays A[p..q] and A[q+1, r], so this part is recursive. We
use merge sort to sort both sub-arrays.
• Combine: we combine the results by creating a sorted array A[p..r] from two sorted
sub-arrays A[p..q] and A[q+1, r]
• How do we merge (combine)? We need two pointers i, j to track the current position
in sub-arrays. Basically, we are placing the mim value to the final array.
Quick sort
• Rearrange the array. We rearrange smaller and larger elements to the right and
left side of the pivot.
• Python code
• Left child of element i is 2i + 1, right child is 2i + 2. Indexing starts from 0
• Parent of element i can be found with (i-1) / 2
• Heap data structure:
◦ It is a complete binary tree (nodes are formed from left to right)
◦ All nodes are greater than children (max-heap)
◦
• To create a Max-Heap from a complete binary tree, we must use a heapify function.
◦
◦ n/2 - 1 is the first index of a non-leaf node.
◦ Heapify function, which bring larger element in top. Used just for one sub-tree
recursively.
void heapify(int arr[], int n, int i) {
Linked List
• Array limitations:
◦ Fixed-size
◦ Physically stored in consecutive memory locations
◦ To insert or delete items, may need to shift data
• Variations of linked list: linear linked list, circular linked list, double linked list
• head pointer "defines" the linked list (it is not a node)
• Operations on Stack:
◦ push(i) to insert the element i on the top of the stack.
◦ pop() to remove the top element of the stack and to return the removed
element as a function value.
◦ top() to return the top element of stack(s)
◦ empty() to check whether the stack is empty or not. It returns true if stack is
empty and returns false otherwise.
Array Representation of Stacks
• [AB*CD/+DE+-] ==> 2 3 * 2 4 / + 4 3 + -
•
• Char Stack Operation
• 2 2
• 3 2, 3
• * 6 2*3
• 2 6, 2
• 4 6, 2, 4
• / 6, 0 2/4
• + 0 6+0
• 4 6, 4
• 3 6, 4, 3
• + 6, 7 4+3
• - -1 6-7
•
Infix to Prefix
First method
Second method
◦ Step 1: Reverse the infix string. Note that while reversing the string you must
interchange left and right parentheses. Eg. (3+2) will be (2+3) but not )2+3(
◦ Step 2: Obtain the postfix expression of the infix expression obtained in Step
1.
◦ Step 3: Reverse the postfix expression to get the prefix expression
• Example: 14 / 7 * 3 - 4 + 9 / 2
•
• Reversed: 2 / 9 + 4 - 3 * 7 / 14
•
• Char Stack Expression
• 2 ( Push at beginning "("
• / (/ 2
• 9 (/ 29
• + (+ 29/
• 4 (+ 29/4
• - (+- 29/4
• 3 (+- 29/43
• * (+-* 29/43
• 7 (+-* 29/437
• / (+-*/ 29/437
• 14 (+-*/ 2 9 / 4 3 7 14
• ) 2 9 / 4 3 7 14 / * - +
•
• DON'T FORGET TO REVERSE: + - * / 14 7 3 4 / 9 2
•
• NOTE: Operator with the same precedence must not be popped from stack
•
• Example: 14 / 7 * 3 - 4 + 9 / 2 ==> + - * / 14 7 3 4 / 9 2
•
• Char Stack Operation
• 2 2
• 9 2, 9
• / 4 9/2 [but in postfix we did 2/9]
• 4 4, 4
• 3 4, 4, 3
• 7 4, 4, 3, 7
• 14 4, 4, 3, 7, 14
• / 4, 4, 3, 2 14/2
• * 4, 4, 6 2*2
• - 4, 2 6-4
• + 6 2+4
•
Queue
• First in, first out (FIFO)
• The queue has a front and a rear
◦
◦ Items can be removed only at the front
◦ Items can be added only at the other end, the rear
• Types of queues:
◦ Linear queue
◦ Circular queue
◦ Double-ended queue (Deque)
◦ Priority queue
Linear Queue
• Enqueue (add an element to back) When an item is inserted into the queue, it always
goes at the end (rear).
• Dequeue (remove element from the front), when an item is taken from the queue, it
always comes from the front.
• Array implementation:
◦ ENQUEUE the same as adding a node at the end Step 1: Allocate memory for
the new node and name it as PTR
◦ Step 2: SET PTR -> DATA = VAL
◦ Step 3:
◦ IF FRONT = NULL
◦ SET FRONT = REAR = PTR
◦ SET FRONT -> NEXT = REAR -> NEXT = NULL
◦ ELSE
◦ SET REAR -> NEXT = PTR
◦ SET REAR = PTR
◦ SET REAR -> NEXT = NULL
◦ [END OF IF]
◦ Step 4: END
◦
◦
• [Link]
• Drawbacks of linear queue Once the queue is full, even though few elements from
the front are deleted and some occupied space is relieved, it is not possible to add
anymore new elements, as the rear has already reached the Queue’s rear most
position.
• In the circular queue, once the Queue is full the "First" index of the Queue becomes
the "Rear" most index, if and only if the "Front" element has moved forward.
Otherwise, it will be a "Queue overflow" state.
• ENQUEUE algorithm:
Insert-Circular-Q(CQueue, Rear, Front, N, Item)
•
• 1. If Front = -1 and Rear = -1:
• then Set Front :=0 and go to step 4
•
• 2. If Front = 0 and Rear = N-1 or Front = Rear + 1:
• then Print: “Circular Queue Overflow” and Return
•
• 3. If Rear = N -1:
• then Set Rear := 0 and go to step 4
•
• 4. Set CQueue [Rear] := Item and Rear := Rear + 1
•
• 5. Return
•
•
• 1. If Front = -1:
• then Print: “Circular Queue Underflow” and Return
•
• 2. Set Item := CQueue [Front]
•
• 3. If Front = N – 1:
• then Set Front = 0 and Return
•
• 4. If Front = Rear:
• then Set Front = Rear = -1 and Return
•
• 5. Set Front := Front + 1
•
• 6. Return
•
• It is exactly like a queue except that elements can be added to or removed from the
head or the tail.
• No element can be added and deleted from the middle.
• Implemented using either a circular array or a circular doubly linked list.
• In a deque, two pointers are maintained, LEFT and RIGHT, which point to either end
of the deque.
• The elements in a deque extend from the LEFT end to the RIGHT end and since it is
circular, Deque[N–1] is followed by Deque[0].
• Two types:
◦ Input restricted deque In this, insertions can be done only at one of the ends,
while deletions can be done from both ends.
◦ Output restricted deque In this deletions can be done only at one of the ends,
while insertions can be done on both ends.
•
Priority Queue
•
Binary Tree
• [Link]
• In a binary tree,
◦ Every node will have three parts: the data element, a pointer to the left
node, and a pointer to the right node.
class Node {
◦ public:
◦ Node *left;
◦ int data;
◦ Node *right;
◦ };
◦ Every binary tree has a pointer ROOT, which points to the root element
(topmost element) of the tree. If ROOT = NULL, then the tree is empty.
◦
Traversing a Binary Tree
• [Link]
• PREORDER (NLR), POSTORDER (LRN) & INORDER TRAVERSAL (LNR)
• Preorder traversal can be used to extract a prefix notation
•
• PREORDER TRAVERSAL (NLR)
• Visiting the root node,
• (a) A, B, D, G, H, L, E, C, F, I, J, K
• (b) A, B, D, C, E, F, G, H, I
•
• (a) G, L, H, D, E, B, I, K, J, F, C, A
• (b) D, B, H, I, G, F, E, C, A
•
• (a) G, D, H, L, B, E, A, C, I, F, K, J
• (b) B, D, A, E, H, G, I, F, C
•
• A binary search tree, also known as an ordered binary tree, is a variant of binary
trees in which the nodes are arranged in an order.
• Left sub-tree nodes must have a value less than that of the root node.
• Right sub-tree must have a value either equal to or greater than the root node.
• O(n) worst case for searching in BST
Search & Insert Operation in Binary Search Tree
•
•
• Insert 39,27,45,18,29,40,9,21,10,19,54,59,65,60 in binary search tree
Graphs
• Vertices (nodes), edges (lines between vertices), undirected graph, directed graph
• Degree of a node - Total number of edges containing the node. If deg(u)=0 then
isolated node.
• Size of a graph - The size of a graph is the total number of edges in it.
• Regular graph - It is a graph where each vertex has the same number of neighbors.
That is, every node has the same degree.
• Connected graph - A graph is said to be connected if for any two vertices (u, v) in V
there is a path from u to v. That is to say that there are no isolated nodes in a
connected graph.
• Complete graph - Fully connected. That is, there is a path from one node to every
other node in the graph. A complete graph has n(n–1)/2 edges, where n is the number
of nodes in G.
• Weighted graph - In a weighted graph, the edges of the graph are assigned some
weight or length.
• Directed Graphs - digraph, a graph in which every edge has a direction assigned to
it.
◦
Breadth First Search Traversal
•
Depth First Search
• [Link]
• Complexity = O(vertices + edges)
• Make sure you don't re-visit visited nodes! Continue on the previous node!
• Backtrack when a dead end is reached! Means don't take the node that has no other
neighbors.
•
• Choose any arbitrary node and PUSH (STATUS 2) it into the stack. Then only we
will POP. When you POP (STATUS 3) and PUSH neighbors.
Threaded Binary Tree
• According to this idea we are going to replace all the null pointers by the appropriate
pointer values called threads.
• The maximum number of nodes with height h of a binary tree is 2h+1-1
• n0 is the number of leaf nodes and n2 the number of nodes of degree 2, then
n0=n2+1
Inorder Traversal in TBT
• A/B*C*D+E
• n: number of nodes
• number of non-null links: n-1
• total links: 2n
• null links: 2n-(n-1)=n+1
• Replace these null pointers with some useful “threads”.
• A one-way threading and a two-way threading exist.
Threaded Binary Tree One-Way
• In the one-way threading of T, a thread will appear in the right field of a node and
will point to the next node in the in-order traversal of T.
•
Threaded Binary Tree Two-Way
• If ptr->left_child is null, replace it with a pointer to the node that would be visited
before ptr in an inorder traversal (inorder predecessor)
• If ptr->right_child is null, replace it with a pointer to the node that would be visited
after ptr in an inorder traversal (inorder successor)
•
• class Node {
• int data;
• Node *left_child, *right_child;
• boolean leftThread, rightThread;
• }
•
Inserting Node in TBT
AVL Trees
• [Link]
• Adelson-Velsky-Landis - one of many types of Balanced Binary Search Tree.
O(log(n))
• Balanced Factor (BF): BF(node) = HEIGHT([Link]) - HEIGH([Link])
• Where HEIGHT(x) is the hight of node x. Which is the number of edges between x
and the furthest leaf.
• -1, 0, +1 balanced factor values.
Insertion in AVL Tree
•
•
•
•
•
•
• Examples:
◦
◦
◦
Deletion in AVL Tree
• Example R1:
• Example R-1:
Huffman Encoding
• Fixed-Length encoding
• Variable-Length encoding
• Prefix rule - used to prevent ambiguities during decoding which states that no binary
code should be a prefix of another code.
◦ Bad Good
◦ a0 a0
◦ b 011 b 11
◦ c 111 c 101
◦ d 11 d 100
◦
•
M-way trees
•
• The binary search tree is the binary tree.
• Each node has m children and m-1 key fields. The keys in each node are in ascending
order.
• A binary search tree has one value in each node and two subtrees. This notion easily
generalizes to an M-way search tree, which has (M-1) values per node and M
subtrees.
• M is called the degree of the tree. A binary search tree, therefore, has degree 2.
• M is thus a fixed upper limit on how much data can be stored in a node.
B-Trees
• Every node in a B-Tree contains at most m children. (other nodes beside root & leaf
must have at least m/2 children)
• All leaf nodes must be at the same level.
• Inserting
◦ Find the appropriate leaf node
◦ If the leaf node contains less than m-1 keys then insert the element in the
increasing order.
◦ Else if the leaf contains m-1:
▪ Insert the new element in the increasing order of elements.
▪ Split the node into the two nodes at the median.
▪ Push the median element up to its parent node.
▪ If the parent node also contains an m-1 number of keys, then split it too
by following the same steps.