BDA05 DistributedComputing

This document discusses distributed computing and MapReduce, specifically: 1. The MapReduce model assigns data splits to mappers which apply map functions to produce intermediate key-value pairs, then reducers combine values by key and apply reduce functions to produce outputs. 2. MapReduce is better than Extract-Transform-Load (ETL) as it moves computation rather than data, improves throughput by distributing tasks across nodes, and maintains data locality. 3. MapReduce can scale to large clusters by partitioning data across machines and moving computation rather than transferring all data over networks.

Uploaded by

Gargi Jana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

BDA05 DistributedComputing

Uploaded by

Gargi Jana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DISTRIBUTED

COMPUTING
Shankar Venkatagiri
\
MODEL

➤ Responsibility: User writes a map & reduce function pair,

and speci es to MapReduce the input/output locations
1. MapReduce Master assigns data (“split”) to mappers
2. Each map function takes an input pair and produces a
set of intermediate key-value pairs
3. MapReduce Master groups intermediate values by key
and assigns them to reducers
4. Each reduce function takes a key + values for that key.
Next, it acts on the values and produces zero/one output
fi

DIFFERENCE

➤ Contrast with the ETL model - Extract-Transform-Load

➤ Schema-on-write versus schema-on-read (e.g. tweets)
➤ Programming must take care of parsing data
➤ Moving Computation is Cheaper than Moving Data
➤ The Master orchestrates multiple map & reduce tasks to
chomp through data splits - improves overall throughput
➤ Locality: The Master maintains all locational information.
It assigns splits to idle map tasks by proximity to the data
➤ Redundancy: Should any nodes/tasks slow down, backup
tasks are launched

SCALING UP

➤ Video: Inside a Google Data Centre (1:34 onwards)

➤ R and Python help analytical processing on a single machine
➤ Q: Why not extend this framework to a cluster?
➤ Data is partitioned across multiple machines in a network
➤ Network transfer rates ≫ Memory access
➤ Probability of failure increases with size and expanse
➤ Ryza et al. “These facts require a programming paradigm that is
sensitive to the characteristics of the underlying system: one that
discourages poor choices and makes it easy to write code that will
execute in a highly parallel manner.”

SPOTLIGHT
➤ Sanjay Ghemawat, Google Fellow
➤ Ph.D (MIT), BS (Cornell)
➤ Winner, ACM Prize in Computing
➤ Projects worked on
➤ MapReduce
➤ Google File System aka GFS
➤ BigTable
➤ TensorFlow
➤ …

MAGICIANS
➤ Dean & Ghemawat (2008): “Users specify the computation in
terms of a map and a reduce function, and the underlying runtime
system automatically parallelizes the computation across large-scale
clusters of machines, handles machine failures, and schedules inter-
machine communication to make e cient use of the network & disks.”
➤ Video: Google Round Table (upto 1:12, 11:10 - 13:35)
ffi

NNG Reference Manual, Second Edition
From Everand
NNG Reference Manual, Second Edition
Garrett D'Amore
No ratings yet
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Lecture 3 MapReduce Spark
No ratings yet
Lecture 3 MapReduce Spark
62 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
No ratings yet
CC_unit4_52e39303-d867-4b14-b5bf-38bc746359c6
14 pages
Parallel Programming, Mapreduce Model: Unit Ii
No ratings yet
Parallel Programming, Mapreduce Model: Unit Ii
47 pages
Parallel Data Processing in The Cloud
No ratings yet
Parallel Data Processing in The Cloud
25 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Week 02
No ratings yet
Week 02
115 pages
Lec 6
No ratings yet
Lec 6
16 pages
Take A Close Look At: Ma Ed
No ratings yet
Take A Close Look At: Ma Ed
42 pages
MAPREDUCEFRAMEWORK
No ratings yet
MAPREDUCEFRAMEWORK
12 pages
B. Hadoop Ecosystem_III (MapReduce)
No ratings yet
B. Hadoop Ecosystem_III (MapReduce)
55 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Big Data Analysis pdf 2
No ratings yet
Big Data Analysis pdf 2
18 pages
132 P16cse5a-P16ite3a 2020052706582977
No ratings yet
132 P16cse5a-P16ite3a 2020052706582977
15 pages
ECS765P_W4_Introduction to Spark
No ratings yet
ECS765P_W4_Introduction to Spark
39 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
BDP 2024 09
No ratings yet
BDP 2024 09
24 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Cloud Chapter 4SWE
No ratings yet
Cloud Chapter 4SWE
40 pages
Lec 6
No ratings yet
Lec 6
14 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
No ratings yet
Traditional Way Vs Map Reduce Way and Steps in Mapreduce (Word Count) - 1
4 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
BIG DATA
No ratings yet
BIG DATA
120 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
CLOUD COMPUTING UNIT 3
No ratings yet
CLOUD COMPUTING UNIT 3
10 pages
Ditp ch2
No ratings yet
Ditp ch2
2 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
7 Full Hadoop Performance Modeling For Job Estimation and Resource Provisioning
No ratings yet
7 Full Hadoop Performance Modeling For Job Estimation and Resource Provisioning
94 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
192 pages
MapReduce BigData 09
No ratings yet
MapReduce BigData 09
9 pages
Lecture 10 Map Reduce
No ratings yet
Lecture 10 Map Reduce
42 pages
MapReduce Pattern Presentation
No ratings yet
MapReduce Pattern Presentation
7 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Chapter 3 - 大数据管理
No ratings yet
Chapter 3 - 大数据管理
38 pages
BDA 2 (1)
No ratings yet
BDA 2 (1)
35 pages
Data Science
No ratings yet
Data Science
7 pages
Big Data and Analytics and MapReduce 29052023 054155pm
No ratings yet
Big Data and Analytics and MapReduce 29052023 054155pm
35 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Apznzazhdljcco08e5denxdpmwyo3o0bbbl Avbpxuleoshb0su5nxvmc0kmm Nedtetebi8yzcpkitljoqgvxy2bm9 h7lf4pttnwfomnaaiuzkwez3ngcw8tojl 2 Mqyh57ajl0gsdcgvi7 Zyq2peekpbhxfc8bwvklrk40yokucqdffpuuvalsrcadb80ozuvpiug5 Vwbpc65kyeem2on3rtvppqicbjz71pp0ho0m
No ratings yet
Apznzazhdljcco08e5denxdpmwyo3o0bbbl Avbpxuleoshb0su5nxvmc0kmm Nedtetebi8yzcpkitljoqgvxy2bm9 h7lf4pttnwfomnaaiuzkwez3ngcw8tojl 2 Mqyh57ajl0gsdcgvi7 Zyq2peekpbhxfc8bwvklrk40yokucqdffpuuvalsrcadb80ozuvpiug5 Vwbpc65kyeem2on3rtvppqicbjz71pp0ho0m
25 pages
BDA04 GoogleCloud
No ratings yet
BDA04 GoogleCloud
33 pages
03-Databases MDB2023
No ratings yet
03-Databases MDB2023
15 pages
02-Data MDB2023
No ratings yet
02-Data MDB2023
12 pages
The VMW Project
No ratings yet
The VMW Project
14 pages
Stragetic Consulting
No ratings yet
Stragetic Consulting
4 pages
Brand Management - Group 4 - Black & Decker Case
No ratings yet
Brand Management - Group 4 - Black & Decker Case
6 pages

BDA05 DistributedComputing

Uploaded by

BDA05 DistributedComputing

Uploaded by

DISTRIBUTED

➤ Responsibility: User writes a map & reduce function pair,

➤ Contrast with the ETL model - Extract-Transform-Load

➤ Video: Inside a Google Data Centre (1:34 onwards)

You might also like