0% found this document useful (0 votes)

10 views

Practical Apache Spark in GraphX

BigData

Uploaded by

22022618

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Practical Apache Spark in GraphX

BigData

Uploaded by

22022618

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Practical Apache Spark in GraphX

we explained the basics of streaming with Spark and we want

to talk about graphs and explore Apache Spark GraphX tool
for graph computation and analysis. It is necessary to say
that GraphX works only with Scala.

A graph is a structure which consists of vertices and edges

between them. Graph theory finds its application in various
fields such as computer science, linguistics, physics,
chemistry, social sciences, biology, mathematics, and others.
Problems connected with graph analysis are rather
complicated, but there are many modern convenient
instruments and libraries for these purposes.
In this post, we will consider the following example of the
graph: the cities are the vertices and the distances between
them are the edges. You can see the Google Maps illustration
of this structure in the figure below.
To start the work with the graph mentioned above, let’s
launch Spark shell. To do this go to the Spark Home
Directory and type in the console:

Bash:
./bin/spark-shell

Now, we have to make some imports:

import org.apache.spark.graphx.Edgeimport
org.apache.spark.graphx.Graphimport org.apache.spark.graphx.lib._

Creating the property graph

To create property graph we should firstly create an array of
vertices and an array of edges. For vertices array, type in
your spark shell:
val verArray = Array((1L, (“Philadelphia”, 1580863)),(2L,
(“Baltimore”, 620961)),(3L, (“Harrisburg”, 49528)),(4L,
(“Wilmington”, 70851)),(5L, (“New York”, 8175133)),(6L,
(“Scranton”, 76089)))

The attributes of the vertices mean the city name and

population, respectively.

As the output you will see the following:

verArray: Array[(Long, (String, Int))] = Array((1,
(Philadelphia,1580863)), (2,(Baltimore,620961)), (3,
(Harrisburg,49528)), (4,(Wilmington,70851)), (5,(New
York,8175133)), (6,(Scranton,76089)))

To create edges array, type in the spark shell:

val edgeArray = Array(Edge(2L, 3L, 113),Edge(2L, 4L, 106),Edge(3L,
4L, 128),Edge(3L, 5L, 248),Edge(3L, 6L, 162),Edge(4L, 1L,
39),Edge(1L, 6L, 168),Edge(1L, 5L, 130),Edge(5L, 6L, 159))

The first and the second arguments indicate the source and
the destination vertices identifiers and the third argument
means the edge property which, in our case, is the distance
between corresponding cities in kilometers.

The above-mentioned input will give us the following output:

edgeArray: Array[org.apache.spark.graphx.Edge[Int]] =
Array(Edge(2,3,113), Edge(2,4,106), Edge(3,4,128), Edge(3,5,248),
Edge(3,6,162), Edge(4,1,39), Edge(1,6,168), Edge(1,5,130),
Edge(5,6,159))

Next, we will create RDDs from the vertices and edges

arrays by using the sc.parallelize()command:
val verRDD = sc.parallelize(verArray)val edgeRDD =
sc.parallelize(edgeArray)

We are ready to build a property graph. The basic property

graph constructor takes an RDD of vertices and an RDD of
edges and builds a graph.
val graph = Graph(verRDD, edgeRDD)

Now we have our property graph, and it is time to consider

basic operations which can be performed with graphs such
as filtration by vertices, filtration by edges, operations with
triplets and aggregation.

Filtration by vertices
To illustrate the filtration by vertices let’s find the cities with
population more than 50000. To implement this, we will use
the filter operator:
graph.vertices.filter {case (id, (city, population)) => population
> 50000}.collect.foreach {case (id, (city, population))
=>println(s”The population of $city is $population”)}

And this is the result we get:

The population of Scranton is 76089
The population of Wilmington is 70851
The population of Philadelphia is 1580863
The population of New York is 8175133
The population of Baltimore is 620961

Triplets
One of the core functionalities of GraphX is exposed through
the triplets RDD. There is one triplet for each edge which
contains information about both the vertices and the edge
information. Let’s take a look through graph.triplets.collect.

As an example of working with triplets, we will find the

distances between the connected cities:
for (triplet <- graph.triplets.collect) {println(s”””The distance
between ${triplet.srcAttr._1} and${triplet.dstAttr._1} is $
{triplet.attr} kilometers”””)}

As a result, you should see:

The distance between Baltimore and Harrisburg is 113 kilometers
The distance between Baltimore and Wilmington is 106 kilometers
The distance between Harrisburg and Wilmington is 128 kilometers
The distance between Harrisburg and New York is 248 kilometers
The distance between Harrisburg and Scranton is 162 kilometers
The distance between Wilmington and Philadelphia is 39 kilometers
The distance between Philadelphia and New York is 130 kilometers
The distance between Philadelphia and Scranton is 168 kilometers
The distance between New York and Scranton is 159 kilometers

Filtration by edges
Now, let’s consider another type of filtration, namely
filtration by edges. For this purpose, we want to find the
cities, the distance between which is less than 150
kilometers. If we type in the spark shell,
graph.edges.filter {case Edge(city1, city2, distance) => distance <
150}.collect.foreach {case Edge(city1, city2, distance)
=>println(s”The distance between $city1 and $city2 is $distance”)}

we will see the next result:

The distance between 2 and 3 is 113
The distance between 2 and 4 is 106
The distance between 3 and 4 is 128
The distance between 4 and 1 is 39
The distance between 1 and 5 is 130

Aggregation
Another interesting task which can be considered here is
aggregation. We will find total population of the neighboring
cities. But before we start, we should change our graph a
little. The reason for this is that GraphX deals only with
directed graphs. But to take into account edges in both
directions, we should add the reverse directions to the
graph. Let’s take a union of reversed edges and original
ones.
val undirectedEdgeRDD = graph.reverse.edges.union(graph.edges)val
graph = Graph(verRDD, undirectedEdgeRDD)
Now we have an undirected graph with all the edges and
directions taken into account, so we can perform the
aggregation using aggregateMessages operator:
val neighbors = graph.aggregateMessages[Int](ectx =>
ectx.sendToSrc(ectx.dstAttr._2), _ + _)

To see the result, type:

neighbors.foreach(println(_))

You should get the following output,

(4,2251352)
(1,8322073)
(5,1706480)
(2,120379)
(6,9805524)
(3,8943034)

where the first argument is the vertex id and the second

argument is the total population of the neighboring cities.

Conclusions
GraphX is very useful Spark component which has many
applications in different fields, from computer science to
biology and social sciences. In this post, we have considered
the simple graph example where the vertices are the cities
and the edges are the distances between them. Some basic
operations such as filtration by vertices, filtration by edges,
operations with triplets and aggregation have been applied
to this graph. All in all, we showed that Apache Spark
GraphX component is very convenient and applicable for
graph computations.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
94% (68)
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
49 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
CAMELS and Film Techniques
100% (2)
CAMELS and Film Techniques
2 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
Jummah Bayaan Friday English Sermon Book
No ratings yet
Jummah Bayaan Friday English Sermon Book
260 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
200104092_DA_4
No ratings yet
200104092_DA_4
14 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
MapInfo Functions
100% (1)
MapInfo Functions
17 pages
Stanford University ACM Team Notebook (2013-14) Combinatorial Optimization
No ratings yet
Stanford University ACM Team Notebook (2013-14) Combinatorial Optimization
52 pages
Advanced Data Structures Labwork 1: October 3, 2013
No ratings yet
Advanced Data Structures Labwork 1: October 3, 2013
12 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
C Puzzle Answers
No ratings yet
C Puzzle Answers
12 pages
Week 10
No ratings yet
Week 10
15 pages
R Unit5
No ratings yet
R Unit5
12 pages
Tikz Package For Economists
No ratings yet
Tikz Package For Economists
29 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
No ratings yet
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
6 pages
DA R Unit-4
No ratings yet
DA R Unit-4
32 pages
Hack 74. Plot Wireless Network Viewsheds With GRASS: 6.12.1. Loading The Terrain Into GRASS
No ratings yet
Hack 74. Plot Wireless Network Viewsheds With GRASS: 6.12.1. Loading The Terrain Into GRASS
9 pages
Major Assignment 1 2017
No ratings yet
Major Assignment 1 2017
9 pages
06 Plots Export Plots
100% (1)
06 Plots Export Plots
17 pages
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
No ratings yet
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
21 pages
TikZ For Economists
No ratings yet
TikZ For Economists
29 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
data visualization.R
No ratings yet
data visualization.R
12 pages
Unit 2
No ratings yet
Unit 2
31 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
CGR 22318 SUPER 20
No ratings yet
CGR 22318 SUPER 20
27 pages
Generating Random Numbers: The Rand Function
No ratings yet
Generating Random Numbers: The Rand Function
7 pages
Basic Simulation Lab File
No ratings yet
Basic Simulation Lab File
9 pages
Grpahs and Charts in R
No ratings yet
Grpahs and Charts in R
12 pages
BDA Experiment 8
No ratings yet
BDA Experiment 8
12 pages
R-Programming - Ai&ds 10 Prog
No ratings yet
R-Programming - Ai&ds 10 Prog
5 pages
Math 551 Lab 3
No ratings yet
Math 551 Lab 3
5 pages
Homework Assignment 6
No ratings yet
Homework Assignment 6
2 pages
Turbo C Interiew Questions With Answers..
No ratings yet
Turbo C Interiew Questions With Answers..
6 pages
Lab 4 - Lists, and Data Abstraction - CS 61A Summer 2019 PDF
No ratings yet
Lab 4 - Lists, and Data Abstraction - CS 61A Summer 2019 PDF
10 pages
R Chart Exercise
No ratings yet
R Chart Exercise
9 pages
Applet GR
No ratings yet
Applet GR
36 pages
Documentclass (10pt, A4paper) (Article)
No ratings yet
Documentclass (10pt, A4paper) (Article)
5 pages
MAT LAB Record Final
No ratings yet
MAT LAB Record Final
42 pages
R Lab12
No ratings yet
R Lab12
8 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
No ratings yet
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
7 pages
MAT LAB
No ratings yet
MAT LAB
24 pages
MIT 402 CAT 2 S
No ratings yet
MIT 402 CAT 2 S
8 pages
Matlab Plot Tutorial
No ratings yet
Matlab Plot Tutorial
3 pages
Lab1 - Basics of Matlab
No ratings yet
Lab1 - Basics of Matlab
49 pages
Pointers
No ratings yet
Pointers
7 pages
math551lab4
No ratings yet
math551lab4
5 pages
Computer Applications in Engineering Design: Introductory Lecture
No ratings yet
Computer Applications in Engineering Design: Introductory Lecture
49 pages
Cs2405 Computer Graphics Lab Manual-New
100% (1)
Cs2405 Computer Graphics Lab Manual-New
115 pages
Graphical Plots and Histograms: 16 - 1 Arrays, Matrix Algebra & Complex Numbers
No ratings yet
Graphical Plots and Histograms: 16 - 1 Arrays, Matrix Algebra & Complex Numbers
16 pages
SA Ex 9,10 - I131
No ratings yet
SA Ex 9,10 - I131
5 pages
Reference Operator (&) : Objects and Lvalues
No ratings yet
Reference Operator (&) : Objects and Lvalues
7 pages
Komputasi Statistik: Pertemuan X
No ratings yet
Komputasi Statistik: Pertemuan X
22 pages
Graphics: Initializes The Graphics System
No ratings yet
Graphics: Initializes The Graphics System
7 pages
Scatter Plot 3 D
No ratings yet
Scatter Plot 3 D
7 pages
Stat2022,chapter 4
No ratings yet
Stat2022,chapter 4
10 pages
SI: Step-By-Step EDM Analysis
No ratings yet
SI: Step-By-Step EDM Analysis
19 pages
C Notes II-UNIT
No ratings yet
C Notes II-UNIT
25 pages
Using The MSChart Control in VB 6
No ratings yet
Using The MSChart Control in VB 6
12 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Circuit Wizard
No ratings yet
Circuit Wizard
10 pages
A Semantic Account of Mirative Evidentials
No ratings yet
A Semantic Account of Mirative Evidentials
3 pages
Proverbs and Riddles
No ratings yet
Proverbs and Riddles
11 pages
6534356865a382945fe2b014 51082240397
No ratings yet
6534356865a382945fe2b014 51082240397
3 pages
Teaching-Learning - Assessment Strategies Towards Higher Order Thinking Skills
No ratings yet
Teaching-Learning - Assessment Strategies Towards Higher Order Thinking Skills
129 pages
State Machines
No ratings yet
State Machines
47 pages
Cambridge IGCSE ™: Information & Communication Technology 0417/02 October/November 2022
No ratings yet
Cambridge IGCSE ™: Information & Communication Technology 0417/02 October/November 2022
14 pages
PTS Bahasa Inggris 12 Animasi 2024
No ratings yet
PTS Bahasa Inggris 12 Animasi 2024
2 pages
Python 5 Manuscripts - Python For Beginners, Python Programming, Hacking With Python, Tor, Bitcoin
100% (8)
Python 5 Manuscripts - Python For Beginners, Python Programming, Hacking With Python, Tor, Bitcoin
620 pages
AS2 Unit 8 pp98-99
No ratings yet
AS2 Unit 8 pp98-99
5 pages
Z Fi BDC Vendor Down Payment
No ratings yet
Z Fi BDC Vendor Down Payment
21 pages
AR_Interface_Linking_RA_CUSTOMER
No ratings yet
AR_Interface_Linking_RA_CUSTOMER
62 pages
RAG and LangChain Loading Documents Round1
No ratings yet
RAG and LangChain Loading Documents Round1
8 pages
Aips
No ratings yet
Aips
8 pages
Class XI A B C - English
No ratings yet
Class XI A B C - English
2 pages
Logcat Home Fota Update Log
No ratings yet
Logcat Home Fota Update Log
212 pages
Diagnostic Questions: Why Are You Asking Questions?
No ratings yet
Diagnostic Questions: Why Are You Asking Questions?
3 pages
Digital Signal Processor: Architecture
No ratings yet
Digital Signal Processor: Architecture
3 pages
Summative Assessment Marking Criteria Sheet
No ratings yet
Summative Assessment Marking Criteria Sheet
3 pages
Prohibitade Way of Reading Quran
No ratings yet
Prohibitade Way of Reading Quran
48 pages
Time Line of Philippine Literature
No ratings yet
Time Line of Philippine Literature
5 pages
data structure 4
No ratings yet
data structure 4
2 pages
Early Salary Profile Ingestion API Guide - V 0.4
No ratings yet
Early Salary Profile Ingestion API Guide - V 0.4
12 pages
January 9th
No ratings yet
January 9th
16 pages
Pembuatan Control Fuel Arduino
No ratings yet
Pembuatan Control Fuel Arduino
6 pages
Objc - App Archirecture - IOs Application Patterns in Swift (EnglishOnlineClub - Com)
No ratings yet
Objc - App Archirecture - IOs Application Patterns in Swift (EnglishOnlineClub - Com)
226 pages
British Council Parents Guide To Bilingualism en 2016
No ratings yet
British Council Parents Guide To Bilingualism en 2016
39 pages
Transaction Processing Systems
No ratings yet
Transaction Processing Systems
14 pages