MPI – Distributed Computing made easy
Last Updated :
19 Sep, 2023
The Underlying Problem
To make things easier, let’s directly jump to some statistics:
- Facebook, currently, has 1.5 billion active monthly users.
- Google performs at least 1 trillion searches per year.
- About 48 hours of video are uploaded on Youtube every minute.
With such high demand, I do believe that a single system would be unable to handle the processing. Thus, comes the need for Distributed Systems.
What is Distributed Computing?
A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system so that users perceive the system as a single, integrated computing facility.
Let us say about Google Web Server, from users perspective while they submit the searched query, they assume google web server as a single system. However, behind the curtain, Google has built a lot of servers which is distributed (geographically and computationally) to give us the result within a few seconds.
Advantages of Distributed Computing?
- Highly efficient
- Scalability
- Less tolerant of failures
- High Availability
Let us look at an example where we save computational time by using distributed computing.
For eg., If we have an array, a, having n elements, a=[1, 2, 3, 4, 5, 6]
We want to sum up all the elements of the array and output it. Now, let us assume that there are 1020 elements in the array and the time to compute the sum is x.
If we now divide the array in 3 parts, a1, a2 and a3 where a1 = { Set of elements where modulo(element from a) == 0 } a2 = { Set of elements where modulo(element from a) == 1 } a3 = { Set of elements where modulo(element from a) == 2 }
We will send these 3 arrays to 3 different processes for computing the sum of these individual processes. On average, let’s assume that each array has n/3 elements. Thus, the time taken by each process will also reduce to x/3. Since these processes will be running in parallel, the three “x/3” will be computed simultaneously and the sum of each array is returned to the main process. In the end, we can compute the final sum of a by summing up the individual sum of the arrays: a1, a2, and a3.
Thus, we are able to reduce the time from x to x/3, if we are running the process simultaneously. What is MPI?
Message Passing Interface (MPI) is a standardized and portable message-passing system developed for distributed and parallel computing. MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines.
MPI gives users the flexibility of calling a set of routines from C, C++, Fortran, C#, Java, or Python. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs)
The advantages of MPI over other message-passing framework is portability and speed. It has been implemented for almost every distributed memory architecture and each implementation is in principle optimized for the hardware on which it runs.
Even though there are options available for multiple languages, Python is the most preferred one due to its simplicity, and ease of writing the code. So, now, we will now look at how to install MPI on ubuntu 14.10.
Install MPI on Ubuntu
1) Step No. 1: Copy the following line of code in your terminal to install NumPy, a package for all scientific computing in python.
sudo apt-get install python-numpy
2) After successful completion of the above step, execute the following commands to update the system and install the pip package.
sudo apt-get update
sudo apt-get -y install python-pip
3) Now, we will download the doc for the latest version of the MPI.
sudo apt-get install libcr-dev mpich mpich-doc
4) Enter the command to download MPI using pip for python
sudo pip install mpi4py
MPI is successfully installed now. Sometimes, a problem might pop up while clearing up the packages after MPI has been installed due to the absence of dev tools in python. You can install them using the following command:
sudo apt-get install python-dev
MPI on Windows/MAC
For Windows/MAC user, they can visit the following link and download the .zip file and unzip and execute it:
Tutorials
Following installation, you can refer to the following documentation for using MPI using python.
References https://en.wikipedia.org/wiki/Message_Passing_Interface
About the Author: Anurag Mishra is currently a 3rd-year B.Tech student is an avid software follower and a full stack web developer. His keen interest lies in web development, NLP, and networking.
If you also wish to showcase your blog here, please see GBlog for guest blog writing on GeeksforGeeks.
Similar Reads
Different Computing Paradigms
Over the years different computing paradigms have been developed and used. In fact different computing paradigms have existed before the cloud computing paradigm. Let us take a look at all the computing paradigms below.1. Distributed Computing :Distributed computing is defined as a type of computing
4 min read
Introduction to Parallel Computing
Before taking a toll on Parallel Computing, first, let's take a look at the background of computations of computer software and why it failed for the modern era. Computer software was written conventionally for serial computing. This meant that to solve a problem, an algorithm divides the problem in
5 min read
Introduction to PySpark | Distributed Computing with Apache Spark
Datasets are becoming huge. Infact, data is growing faster than processing speeds. Therefore, algorithms involving large data and high amount of computation are often run on a distributed computing system. A distributed computing system involves nodes (networked computers) that run processes in para
4 min read
Election algorithm and distributed processing
Distributed Algorithm is an algorithm that runs on a distributed system. Distributed system is a collection of independent computers that do not share their memory. Each processor has its own memory and they communicate via communication networks. Communication in networks is implemented in a proces
10 min read
Serverless Computing and FaaS Model - The Next Stage in Cloud Computing
Serverless computing and the Function-as-a-Service (FaaS) model are the next stage in cloud computing. They represent a shift away from traditional server-based computing models towards a more dynamic, scalable, and event-driven architecture. Here are some key features and benefits of serverless com
10 min read
Expected Properties of a Big Data System
Prerequisite - Introduction to Big Data, Benefits of Big Data There are various properties which mostly relies on complexity as per their scalability in the big data. As per these properties, Big data system should perform well, efficient, and reasonable as well. Letâs explore these properties step
3 min read
Project Idea | God's Eye
Project Idea: Complete user's Activities tracking with Remote Assistance This Program is divided into two major modules one is for complete activity tracking and can even tell which activity is currently active on the user's side. Can also work as a Global Keylogger to log the text written by a user
3 min read
Hadoop Tutorial
Big Data is a collection of data that is growing exponentially, and it is huge in volume with a lot of complexity as it comes from various resources. This data may be structured data, unstructured or semi-structured. So to handle or manage it efficiently, Hadoop comes into the picture. Hadoop is a f
3 min read
Distributed Objects Computing: The next generation of client-server computing
Software technology is in the midst of a major computational shift towards distributed object computing DOC. Distributed computing is poised for a second client-server revolution, a transition from first generation client-server era to the next generation client-server era. In this new client-server
2 min read
What is Distributed Computing?
Distributed computing refers to a system where processing and data storage is distributed across multiple devices or systems, rather than being handled by a single central device. In a distributed system, each device or system has its own processing capabilities and may also store and manage its own
3 min read