MPI – Distributed Computing made easy
Last Updated :
19 Sep, 2023
The Underlying Problem
To make things easier, let’s directly jump to some statistics:
- Facebook, currently, has 1.5 billion active monthly users.
- Google performs at least 1 trillion searches per year.
- About 48 hours of video are uploaded on Youtube every minute.
With such high demand, I do believe that a single system would be unable to handle the processing. Thus, comes the need for Distributed Systems.
What is Distributed Computing?
A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system so that users perceive the system as a single, integrated computing facility.
Let us say about Google Web Server, from users perspective while they submit the searched query, they assume google web server as a single system. However, behind the curtain, Google has built a lot of servers which is distributed (geographically and computationally) to give us the result within a few seconds.
Advantages of Distributed Computing?
- Highly efficient
- Scalability
- Less tolerant of failures
- High Availability
Let us look at an example where we save computational time by using distributed computing.
For eg., If we have an array, a, having n elements, a=[1, 2, 3, 4, 5, 6]
We want to sum up all the elements of the array and output it. Now, let us assume that there are 1020 elements in the array and the time to compute the sum is x.
If we now divide the array in 3 parts, a1, a2 and a3 where a1 = { Set of elements where modulo(element from a) == 0 } a2 = { Set of elements where modulo(element from a) == 1 } a3 = { Set of elements where modulo(element from a) == 2 }
We will send these 3 arrays to 3 different processes for computing the sum of these individual processes. On average, let’s assume that each array has n/3 elements. Thus, the time taken by each process will also reduce to x/3. Since these processes will be running in parallel, the three “x/3” will be computed simultaneously and the sum of each array is returned to the main process. In the end, we can compute the final sum of a by summing up the individual sum of the arrays: a1, a2, and a3.
Thus, we are able to reduce the time from x to x/3, if we are running the process simultaneously. What is MPI?
Message Passing Interface (MPI) is a standardized and portable message-passing system developed for distributed and parallel computing. MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines.
MPI gives users the flexibility of calling a set of routines from C, C++, Fortran, C#, Java, or Python. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs)
The advantages of MPI over other message-passing framework is portability and speed. It has been implemented for almost every distributed memory architecture and each implementation is in principle optimized for the hardware on which it runs.
Even though there are options available for multiple languages, Python is the most preferred one due to its simplicity, and ease of writing the code. So, now, we will now look at how to install MPI on ubuntu 14.10.
Install MPI on Ubuntu
1) Step No. 1: Copy the following line of code in your terminal to install NumPy, a package for all scientific computing in python.
sudo apt-get install python-numpy
2) After successful completion of the above step, execute the following commands to update the system and install the pip package.
sudo apt-get update
sudo apt-get -y install python-pip
3) Now, we will download the doc for the latest version of the MPI.
sudo apt-get install libcr-dev mpich mpich-doc
4) Enter the command to download MPI using pip for python
sudo pip install mpi4py
MPI is successfully installed now. Sometimes, a problem might pop up while clearing up the packages after MPI has been installed due to the absence of dev tools in python. You can install them using the following command:
sudo apt-get install python-dev
MPI on Windows/MAC
For Windows/MAC user, they can visit the following link and download the .zip file and unzip and execute it:
Tutorials
Following installation, you can refer to the following documentation for using MPI using python.
References https://en.wikipedia.org/wiki/Message_Passing_Interface
About the Author: Anurag Mishra is currently a 3rd-year B.Tech student is an avid software follower and a full stack web developer. His keen interest lies in web development, NLP, and networking.
If you also wish to showcase your blog here, please see GBlog for guest blog writing on GeeksforGeeks.
Similar Reads
Different Computing Paradigms
Over the years different computing paradigms have been developed and used. In fact different computing paradigms have existed before the cloud computing paradigm. Let us take a look at all the computing paradigms below.1. Distributed Computing :Distributed computing is defined as a type of computing
4 min read
Election algorithm and distributed processing
Distributed Algorithm is an algorithm that runs on a distributed system. Distributed system is a collection of independent computers that do not share their memory. Each processor has its own memory and they communicate via communication networks. Communication in networks is implemented in a proces
10 min read
Advantages of Distributed database
Distributed databases basically provide us the advantages of distributed computing to the database management domain. Basically, we can define a Distributed database as a collection of multiple interrelated databases distributed over a computer network and a distributed database management system as
4 min read
Cloud Computing Platforms and Technologies
Cloud computing applications develops by leveraging platforms and frameworks. Various types of services are provided from the bare metal infrastructure to customize-able applications serving specific purposes. Amazon Web Services (AWS) - AWS provides different wide-ranging clouds IaaS services, whic
4 min read
Clusters In Computer Organisation
A cluster is a set of loosely or tightly connected computers working together as a unified computing resource that can create the illusion of being one machine. Computer clusters have each node set to perform the same task, controlled and produced by the software. Clustered Operating Systems work si
7 min read
Failover System in Cloud
Pre-requisite: Cloud Computing The failover system technique makes use of established clustering technologies to provide redundant implementations in order to boost the availability and reliability of IT resources. When an active IT resource becomes unavailable, a failover system is set up to automa
3 min read
Serverless Computing and FaaS Model - The Next Stage in Cloud Computing
Serverless computing and the Function-as-a-Service (FaaS) model are the next stage in cloud computing. They represent a shift away from traditional server-based computing models towards a more dynamic, scalable, and event-driven architecture. Here are some key features and benefits of serverless com
10 min read
An Overview of Cluster Computing
Introduction :Cluster computing is a collection of tightly or loosely connected computers that work together so that they act as a single entity. The connected computers execute operations all together thus creating the idea of a single system. The clusters are generally connected through fast local
4 min read
Expected Properties of a Big Data System
Prerequisite - Introduction to Big Data, Benefits of Big Data There are various properties that mostly rely on complexity as per their scalability in big data. As per these properties, Big data systems should perform well, efficiently, and reasonably well. Letâs explore these properties step by step
6 min read
Introduction to Cloud Mobility
Introduction : Cloud mobility is related to balancing the resources and the costs between different cloud services which can be public or private cloud services. It is an emerging trend aimed at making workload migration across the platform easier. Mobility helps to accomplish the jobs and customer
4 min read