0% found this document useful (0 votes)
69 views

Scientific Writing Parallel Computing V2

This document discusses parallelizing sorting algorithms like Quicksort, Mergesort, and Bubblesort using OpenMP. It analyzes the performance of these algorithms on two datasets with varying numbers of threads. The authors find that parallelizing the sorting algorithms with OpenMP improves performance compared to sequential versions, delivering faster execution times, higher speedup, and better efficiency. OpenMP is effective for this task since it can parallelize code without major system changes by adding directives to existing code.

Uploaded by

VD Mirela
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Scientific Writing Parallel Computing V2

This document discusses parallelizing sorting algorithms like Quicksort, Mergesort, and Bubblesort using OpenMP. It analyzes the performance of these algorithms on two datasets with varying numbers of threads. The authors find that parallelizing the sorting algorithms with OpenMP improves performance compared to sequential versions, delivering faster execution times, higher speedup, and better efficiency. OpenMP is effective for this task since it can parallelize code without major system changes by adding directives to existing code.

Uploaded by

VD Mirela
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Facultatea de Cibernetică, Statistică și Informatică Economică 1

Parallel computing. Performance of sort algorithms in OpenMP

Georgiana NEAȚĂ1, Ionuț NICOLAE2, Horațiu ȚIBREA3, Raul VIDIS4, Georgiana UDREA5

1
Bucharest University of Economic Studies

2
Bucharest University of Economic Studies

3
Bucharest University of Economic Studies

4
Bucharest University of Economic Studies

5
Bucharest University of Economic Studies

[email protected], [email protected], [email protected],


[email protected], [email protected]

In this paper we aim to parallelize the Quicksort, Merge Sort and Bubble Sort algorithms
using multithreading (OpenMP) platform. One of the representative sorting algorithms,
merge sort, is widely used in database systems that requires sorting in order to maintain its
stability. The proposed method examined on two standard datasets with different numbers of
threads. The elements of the input datasets are distributed into these temporary sub-arrays
depending on the number of characters in each word. We want to see the experimental results
of this study to reveal the performance of parallelization the proposed Quicksort, Merge Sort
and Bubble Sort algorithms has shown improvement when compared to the sequential
Quicksort, Merge Sort and Bubble Sort algorithms by delivering improved Execution Time,
Speedup and Efficiency. We implemented OpenMP using Intel Core i5-4210U, 1.7 GHz, 8,00
GB RAM, 4 CPUs. Finally, we get the data structure effects on the performance of the
algorithm for that we choice the second approach.

Keywords: Bubble Sort, Merge Sort, OpemMP, sorting algorithms, parallel computing
Facultatea de Cibernetică, Statistică și Informatică Economică 2

Introduction

1 Sorting is one of the most common operations perform with a computer. Basically,
it is a permutation function which operates on elements. In computer science sorting
algorithm is an algorithm that arranges the elements of a list in a certain order. Sorting
algorithms are taught in some fields such as Computer Science and Mathematics. There are
many sorting algorithms used in the field of computer science such as Bubble, Insertion,
Selection, Quick etc. They differ in their functionality, performance, applications, and
resource usage. In this paper we will focus on the bubble sort, merge sort and quick sort
algorithms.
Bubble sort is the oldest, the simplest and the slowest sorting algorithm in use having
a complexity level of O(n2). Bubble sort works by comparing each item in the list with the
item next to it and swapping them if required. The algorithm repeats this process until to make
passes all the way through the list without swapping any items. Such a situation means that all
the items are in the correct order. By this way the larger values move to the end of the list
while smaller values remain towards the beginning of the list. It is also used in order to sort
the array such like the larger values comes before the smaller values. In other words, all items
are in the correct order. The algorithm’s name, bubble sort, comes from a natural water
phenomenon where the larger items sink to the end of the list whereas smaller values
“bubble” up to the top of the data set. Bubble sort is simple to program, but it is worse than
selection sort for a jumbled array. It will require many more component exchanges, and is just
good for a pretty well ordered array. More importantly bubble sort is usually the easiest one to
write correctly.
For database system that processes large amount of information, there are many search
algorithms that are fast and accurate for usage. Similarity of all these algorithms is that data
are assembled in sort by search requirement . It means that fast and accurate search is fast and
accurate sorting. When huge amount of information is updated in a day, this situation
demands more frequent data sort. Also, there is trend in which amount of data for sort is
getting increased for one process. Therefore, in database system, time and effort are getting
increased for sorting. A method is needed to reduce sorting time to fix this problem. A
database system that processes huge amount of information needs increase in system
effectives but since information must be provided continuously, it requires a method that does
not alter system greatly while increasing effectiveness. However, a regular comparison-based
algorithm can’t go beyond effectiveness change. There are parallel method that takes care of
data simultaneously in hopes of raising effectiveness but those methods arranges road
balancing or makes communication between inter- processor faster, which makes construction
of parallel method difficult and creates great change to the system in the process. Trying to
solve these weaknesses, this thesis chose merge sort algorithm due to its characteristic of
being stable, which is a reason it is widely used in database system, and OpenMP was used
for parallel method. OpenMP would not alter system greatly by inserting directive for parallel
code into existing code, and it is supported by most of compilers so it can be used in most of
systems. Therefore, parallel merge sort using OpenMP can solve problems discussed earlier.
In comparison to other methods, it can be implemented easily so we can expect higher
effectiveness without much effort.
Hoare’s Quicksort algorithm, is one of the most intensively studied problems in
computer science. It utilizes the “divide and conquers” strategy by reducing the sorting
Facultatea de Cibernetică, Statistică și Informatică Economică 3

problem into several easier sorting problems and solve each of them. Due to the good
performance in practise the Quicksort algorithm considered one of the most popular sorting
algorithm. One value is selected form the input data which normally called the pivot value,
this pivot use to partitioning the input dataset into two subsets that one contains input data
smaller in size compared to the pivot value and the other contains input data higher than the
pivot value. In every single step these divided datasets are sub-divided selecting pivots from
each set. The recursive operation is no longer occur when there is no sub division is possible.
It employs the “divide and conquers” technique by minimizing the sorting problem into
several simpler sorting problems and solve each of them.

Quick Sort Merge Sort Bubble Sort

dsa
Fig. 1. The mechanism of algorithms

OpenMP is a widely adopted shared memory parallel programming interface


providing high level programming constructs that enable the user to easily expose an
application’s task and loop level parallelism in an incremental fashion. The range of OpenMP
applicability was significantly extended recently by the addition of explicit tasking features.
The idea behind OpenMP is that the user specifies the parallelization strategy for a program at
a high level by providing the program code. OpenMP is the implementation of
multithreading, a parallel execution scheme where the master thread assigns a specific number
of threads to the slave threads and a task is divided between them. OpenMP is basically
designed for shared memory multiprocessors, using the SPMD model (Single Program,
Multiple Data Stream). All the processors are able to directly access all the memory in the
machine, through a logical direct connection. Programs will be executed on one or more
processors that share some or all of the available memory. Using OpenMP the programmer
can write code that will be able to use all cores of a multicore computer, and will be run faster
if the number of cores are increased. In this section we will implement our sequential bubble
sort as parallel by using an OpenMP model in visual C++ language, in the early stage of the
program writing we should adapt our environment to be able to understand OpenMP
statements. It is not worthwhile just to write #include <omp.h> header in our program but also
required to include some configurations to C++ environment. After the creation of our
project, we update the C++ environment to familiar with OpenMP as follows: From the Tools
menu we select project properties and below the windows appear by which we choose/set
Facultatea de Cibernetică, Statistică și Informatică Economică 4

configuration properties for C/C++ language, after that we change the “OpenMP Support
field” to yes value.

Fig. 2. Program flow in an OpenMP execution model

OpenMP is an application programming interface for shared parallel programming


systems. It offers high level programming constructs by utilizing a set of directives which are
written within a code, in one of the high-level programming languages such as C/C++ or
FORTRAN. In the OpenMP model, the implementation is usually beginning as a master
thread. Whenever the code comes into a parallel region the master thread creates several slave
threads that work together to carry out the calculations. The slaves thread shares information
with other threads, owning additionally a private memory for auxiliary and temporary
variables. Figure 3 shows the typical execution model of a simple OpenMP program.
#pragma omp parallel for shared(n,a,b,c) private(i)
for(i=0;i<n;i++)
c[i]=a[i]+b[i];

Fig. 3. Example of the use of OpenMP

2 Methods
We sort the datasets using the bubble sort algorithm in three phases. In the first phase,
we are removing / ignoring the special characters from the text file. In the second phase, we
convert the text file to array of list (vectors of string) based on the length of characters, all
shorter words come be for longer words. In the third phase, we sort each vector of string by
Facultatea de Cibernetică, Statistică și Informatică Economică 5

arranging in the alphabetic order using the bubble sort algorithm. Table 1 shows the time of
pre-processing phase.

Bubble Sort in sequential


Implemented based on vectors of strings. The procedure of first approach: 1.Loading the data
from the text file and store in the vector. 2.Removing/Ignoring the special characters ( e.g.
“, . ? ! etc.) from the text file. This is the preprocessing stage as we mention above. 3.
Creating an array of vectors based on the longest word in the text file. 4. Appling the bubble
sorting for each word in the vector, the bubble sort is based on ASSCII code for each letter in
the text file.

Data Set1 size Data Set2 size of


Function
of (190KB) (1.38MB)
Preprocessing to remove the special 0.265 second 1.154 second
characters from a text file
Preprocessing on file by using bubble
274.528 second 274.528 second
sort based on alphabetic order
Preprocessing on file by using bubble 230.215second 230.215second
sort based on the length of the word
Bubble sort on array of vector each
vector has the same length of the 42.589 second 1620.44 second
Word
Table 1. Execution Time of Pre-processing phase
Bubble Sort in parallel with OpenMp directives
One of the fundamental problems of computer science is ordering a list of items. There
are a lot of solutions for this problem, known as sorting algorithms. Some sorting algorithms
are simple and intuitive, such as the bubble sort, but others, like quick sort, are extremely
complicated but produce lightening-fast results. The sequential version of the bubble sort
algorithm is considered to be the most inefficient sorting method in common usage. In this
paper we want to prove that how the parallel bubble sort algorithm is used to sort the text file
parallel and will show that it may or may not better than the sequential sorting algorithm. The
old complexity of the bubble sort algorithm was O(n2), but now we are using the complexity
for bubble sort algorithm n(n-1)/2. As usually in parallelism we can decompose the data or
functions or sometimes both, but after we are studying the bubble sort we reached to the
point, that it is more suitable for data decomposition. So we suggest during the
implementation of OpenMP is to assign each vector to individual process.
Now if we return back to our sequential program the first question can the program
inherits the parallelism? After the evaluation of the program we identify that it can be worked
as a parallel. The second question which parts of it can be worked as parallel. It depends on
understanding the problem, so we found that hot spots occur in the loop statements as more
perfectly because we work on our problem as 3D matrix, so we will assign each vector to be
implemented in one processor, as a result the vector of 1,2,3.... length will be executed
separately at the same time.

Quick Sort in sequential


The proposed parallel Quick Sort algorithm initially, starting with pre-processing of
the input dataset to remove the special characters from the input text file, after that the
Facultatea de Cibernetică, Statistică și Informatică Economică 6

fundamental idea is to creating many additional temporary sub-arrays according to a number


of characters in each word, the sizes of each one of these sub-arrays are adopted based on a
number of elements with the exact same number of characters in the input array. Finally, the
elements of the input datasets is distributing into these temporary sub-arrays depending on the
number of characters in each word. The datasets chosen are the same with Bubble Soert data
file (Set1.txt and Set2.txt), dataset are different in size and length. The size of the first dataset
is equal to 190 KB and the second one is equal to 1.38 MB.

Quick sort in parallel using OpenMp directives


Based on the problem given in the previous section, the algorithm is creating many
additional temporary sub-arrays according to a number of characters in each word, based on
length of word this array distributes into a sub arrays to sort each array using the Quicksort
algorithm. The Quicksort recursive function considered the hotspot of the program, for each
sub array we assign a set portion of data to each thread. Since Quicksort algorithm is based on
recursive calls for two partitions of dataset in a level, we assign each call and each partition to
a thread in order the partitions will be arranged indecently as shown in Figure 3.

Fig. 1. Quicksort Algorithm using OpenMP for each array


Parallel sections: executing each section concurrently is the same as executing each
section sequentially. In the proposed parallel recursive function, the algorithm call itself
twice, the OpenMP sections directive used to Parallelization the recursive (see Figure 4).
Facultatea de Cibernetică, Statistică și Informatică Economică 7

Fig. 2. Proposed Quick Sort recursive function


Merge Sort in sequential
Merge sort is one of premium sorting algorithms that before sorting, data is separated
(divide), each partial component is sorted (conquer), then all sorted components are pasted
together in a cycle to sort entire data, which is one of the (divide and conquer) method.
During a cyclical call, it goes through three steps; divide, conquer, and paste. Divide is a step
where given data is divided into many small data. Conquer is a step where divided data is
sorted in orderly manner. Last step is paste where data is pasted and completely sorted.
During merge sort’s process, depending on how unit is divided and goes through those three
steps are called k-way merge sort. Like Fig 1, if it is divided into two units, it is called 2-way.
Like Fig 2, if it is divided into four units, it is called 4-way.

Fig. 1. 2-way divided

Fig. 2. 4-way divided

Merge Sort in parallel


In this paper, previously mentioned merge sort is paralleled by using OpenMP. In
types of merge sort, to find out difference in speed due to number of way, way is changed and
implemented to 2-way, 4-way, and 8-way. Also, to find out difference between performance
of parallel region and performance due to number of cores, according to number of way,
method of dividing and number of cores were changed together.
In case of Fig 3, total data is divided into four (1). Divided data are each goes to merge
sort in (2). In (3) and (4), it goes through merge sort again and then pasted into sorted data. In
a case where four cores are used, in (2), each one goes through merge sort so (2) uses four
cores, and (3) and (4) each uses two and one respectively. When two cores are used for
Facultatea de Cibernetică, Statistică și Informatică Economică 8

parallel processing, in (2), first two data are sorted then next remaining two are sorted. Then it
goes to (3). Number of cores used in (2) is two, and it was used twice. (3) and (4) used two
and one respectively to reduce the time.

Fig. 3. 2-way merge sort divided by 4 data


In case of Fig 4, it is total process is equal to Fig 6, but the data is divided into 8 units
and if four cores are used, in (2), first four goes through merge sorted, then next four goes
through merge sort. In (3), eight data are simultaneously pasted together using four cores. In
case where two cores are used, in (2), two data go through merge sort four times. In (3) and
(4), two cores are used twice and once respectively. In last step (5), only one core is used.
This was for comparing number of division and speed of sorting when total amount of data is
equal for merge sort.

Fig. 4. 2-way merge sort divided by 8 data


4-way merge sort, different from 2-way which compares two data at a time, it can
compare four data at once for sorting. Therefore, data distribution is increased to multiples of
four, four and sixteen, to compare four data at once. Fig 8 shows entire data is divided into
four so one through four data goes through merge sort by each core, and it goes through
merge sort in the last step. If all four cores are used, it can go to last step at once. If two cores
are used, parallel processing goes twice in the second step and one core is used for the last
step in the 4-way merge sort. Fig 5 shows order for 4-way merge sort when data is divided
into four. In case where two cores are used, dark box shows data pairing where 1 and 2 are
processed, then 3 and 4, then when 1 through 4 is done, last 4-way merge sort is done by
using only one core.
Facultatea de Cibernetică, Statistică și Informatică Economică 9

Fig 6 shows 4-way merge sort when data is divided into sixteen. Merge sort is
processed in numerical order, and dark box shows data pairing when four cores are used. Four
cores each process 1~4, 5~8, 9~12, and 13~16 then process 17~20. After that you get the
result in the last step with one core. If only two cores are used, then units should be
numerically paired in two and be processed.

Fig. 5. 4-way merge sort divided by 4 data


Fig. 6. 4-way merge sort divided by 16 data

3 Results

Bubble Sort in sequential


The performance of the Bubble Sort sequential code for two datasets which has been
tested for 5 times to get the average is shown in table 1.
Test / Sec Dataset1 (190 KB ) Dataset2 (1.32 MB )
T1 44.239 1694.51
T2 44.013 1697.98
T3 44.239 1702.21
T4 44.503 1696.88
T5 44.23 1698.3
Ave 44.373 1686.177

Table 1. Shows datasets have been tested 10 times to get an average based on the first
approach.
Facultatea de Cibernetică, Statistică și Informatică Economică 10

Bubble Sort in parallel


The performance of the Bubble Sort sequential code for two datasets which has been
tested for 5 times to get the average is shown in table 1.

Thread number Dataset 1 (time / sec) Dataset 2 (time / sec)

1 6.695 188.18
2 5.103 132.66
4 4.572 84.271

6 3.751 65.846

8 3.167 51.046

10 3.826 51.046

16 4.858 52.991

Table 2. Bubble sort work as parallel using OpenMP

Fig. 3. Shows the performance of OpenMP with Dataset 1


Facultatea de Cibernetică, Statistică și Informatică Economică 11

Fig. 4. Shows the performance of OpenMP with Dataset 2

Quick Sort in sequential

Test / sec File 1 (180 KB) File 2 (1.18 MB)


T1 0.040012 2.401233
T2 0.041692 2.396601
T3 0.045342 2.400768
T4 0.045127 2.399871
T5 0.044149 2.411457
Average 0.0432644 2.401986

Table 3. Quick Sort - Sequential time for different data size


Quick Sort in parallel

Thread number Dataset 1 (time / sec) Dataset 2 (time / sec)

1 0.041054 2.401157

2 0.041054 2.401157

4 0.015838 1.40058

Table 4. Quick Sort - OpenMP parallel time for different data size and different
number of threads
Facultatea de Cibernetică, Statistică și Informatică Economică 12

Fig. 5. Sequential and parallel execution time of the Quicksort algorithm

Merge Sort – Parallel and Sequential

Condition Num of cores Dataset 1 Dataset 2

1 0.3362 0.3376
2 0.3405 0.2076
4 0.3345 0.1749
2-way
8 0.3396 0.1753
1 0.3857 0.3849
4 0.3788 0.2042
4-way
16 0.3614 0.1971
1 0.3936 0.3915
8-way 8 0.3669 0.2035

Table 5. Execution time of merge sort

4
Facultatea de Cibernetică, Statistică și Informatică Economică 13

Analysis

Bubble Sort
According to the pervious results the OpenMP shows the best speedup was when used
8 threads. That means, the best speedup occurs when using threads number as equal to the
actual cores number (see table 5). In other words, increasing the number of threads up to the
actual number of cores do not lead to any advantage really it will be affected on the speedup
value. Because the increasing of thread means more works to dividing the tasks, create
threads and destroy it etc. And what proves this conclusion the Increasing decadence in the
value of the efficiency as a result we will have huge idle time.

Quick Sort
The results of using (Dataset 1) shows higher speedup when running the parallel
algorithm compared to the sequential algorithm in different number of threads. The ratio of
the efficiency using 4 cores with 4 threads in the parallel method is close by the optimal result
of utilizing such number of cores. It could be viewed that the ratio of the efficiency increases
when the number of cores increases. Moreover, when using (Dataset 2) also shows higher
speedup when running the parallel program compared to the sequential program with different
number of threads.

Merge Sort
Most clearly shown result is when core is increased to take care of many data
simultaneously. If two cores are used for parallel processing, in case of 2-way with dividing
data into four showed above 1.9 improvement. If four cores were used it showed 2.8x
improvement. It used twice as much cores but showed 1.5x improvement from two. This is
due to parallel region is limited which all four cores can’t participate in processing at once. In
8-way, from single to dual showed 1.8x improvement and single to quad showed 2.9x
improvement similar to 2-way. In case of 4-way, if data is split into four and sixteen, single to
quad showed 2.8x and 2.67x respectively. This is due to process where data is increased for
single core, some of it were sorted early by operating system. So, area available for OpenMP
were smaller. Specific number is different but shows similar amount of improvement in
performance.
Facultatea de Cibernetică, Statistică și Informatică Economică 14

5 Conclusion

In this paper we implemented the bubble sort, quick sort and merge sort algorithm
using multithreading (OpenMP). The proposed work tested on two standard datasets (text file)
with different size taken from http://www.booksshouldbefree.com/. We implemented
OpenMP using Intel Core i5-4210U, 1.7 GHz, 8,00 GB RAM, 4 CPUs.
Finally, we get the data structure effects on the performance, where this is clear in
sequential code 1 and 2. In OpenMP, increasing the number of threads more than an actual
core number it will be affected on the speed up only. For the future work we will implement
bubble sort using massage Passing Interface (MPI) and compiler the result with OpenMP
approach.
The proposed method of parallelization quick sort algorithm examined on two
standard datasets with different number of threads. Experimental results of this study, which
have been explained carefully in the previous sections, reveal that the performance of
parallelization the proposed Quicksort algorithm has shown improvement when compared to
the sequential Quicksort algorithm by delivering improved Execution Time, Speedup and
Efficiency.
It is found that when using File1 and File2 dataset shows higher speedup when
running the parallel algorithm compared to the sequential algorithm in different number of
threads. The ratio of the efficiency using 4 threads with 4 cores in the parallel method is close
by the optimal result of utilizing such number of cores. This led to conclusion when the
number of cores increase the ratio of the efficiency increase too. We planned in the future to
implement the Quicksort algorithm using Message Passing Interface (MPI) and compare its
results with OpenMP method.
For improving performance of database system, performance of sorting algorithms
must increase. Among techniques that improve performance, there is parallel processing.
However, it is difficult to materialize, unexpected error can occur, and can alter system
greatly. Therefore, being effective and easy to use, we paralleled merge sort algorithm using
OpenMP. When looking at the results using the same number of cores, there is no clear
relationship with k-way and merge sort. Though changing k-way, it can reduce number of
combining but due to computers trait, k-way can’t be all handled at once and internally it uses
binary comparator so total number of comparisons is the same. It should be distributed to
parallel merge sort algorithm using OpenMP and parallel region should be at least be greater
than number of cores so that it can reduce time that core is not being used. If the number of
parallel regions is less than the number of cores, the performance is hardly improved not
being able to conjugate the resources. It can be shown by using single core with k-way merge
sort and using dual or quad core with 2-way merge sort. For greater effectiveness, area should
be in multiples of number of cores so that when earlier steps are done, number of core waiting
can be reduced, which will result in greater usage of entire core and great performance in
parallel processing. Through running merge sort algorithm, we implemented with
conclusions, in case of dual core, it showed 1.8x improvement, and in case of quad core, it
showed 2.8x improvement. Finally, without changing system greatly, data should be divided
Facultatea de Cibernetică, Statistică și Informatică Economică 15

so that area where OpenMP is applied should be set and parallel that part so great
improvement were acquired.

6 References
[1] Grama, Ananth. Introduction to parallel computing. Pearson Education, (2003).
[2] Neininger, Ralph. "Refined quicksort asymptotics." Random Structures & Algorithms
46.2, 346-361, (2015)
[3] Aumüller, Martin, and Martin Dietzfelbinger. "Optimal Partitioning for Dual-Pivot
Quicksort." ACM Transactions on Algorithms (TALG) 12.2, 18 (2015)
[4] Minsoo K, and Dongseung K. “Parallel Merge Sort with Load Balancing”; International
Journal of Parallel Programming., Vol. 31., No. 1., 2003.
[5] Rohit C, Leonaldo D, Dave K, Dror M, Jeff M, Ramesh M. “Parallel Programming in
OpenMP”, 2001.
[6] Bakara C, Gabriele J, and Ruud van der P. “Using OpenMP”, 2007.
[7] Altukhaim, S. (2003), Bubble Sort Algorithm, Florida Institute of Technology.
[8] Cătălin B. (2020), Parallel Processing - CPP and OMP lectures, Bucharest University
of Economic Studies
[9] Cătălin B. (2020), Parallel Processing - Algorithms for sorting, searching and from
other fields that can be parallized lectures, Bucharest University of Economic Studies

You might also like