0% found this document useful (0 votes)
34 views40 pages

Statistical Computing With R: Masters in Data Science 503 (S15) Third Batch, SMS, TU, 2024

The document provides an overview of Social Network Analysis (SNA) and its applications in understanding social structures through networks and graph theory. It discusses key concepts such as nodes, edges, and centrality measures, and illustrates how SNA can be used to analyze relationships, including criminal networks. Additionally, it includes practical examples and R code snippets for performing SNA using the igraph package.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views40 pages

Statistical Computing With R: Masters in Data Science 503 (S15) Third Batch, SMS, TU, 2024

The document provides an overview of Social Network Analysis (SNA) and its applications in understanding social structures through networks and graph theory. It discusses key concepts such as nodes, edges, and centrality measures, and illustrates how SNA can be used to analyze relationships, including criminal networks. Additionally, it includes practical examples and R code snippets for performing SNA using the igraph package.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Statistical Computing with R:

Masters in Data Science 503 (S15)


Third Batch, SMS, TU, 2024
Shital Bhandary
Associate Professor
Statistics/Bio-statistics, Demography and Public Health Informatics
Patan Academy of Health Sciences, Lalitpur, Nepal
Faculty, Data Analysis and Decision Modeling, MBA, Pokhara University, Nepal
Faculty, FAIMER Fellowship in Health Professions Education, India/USA.
Review Preview
• Social Networks: • Social Network Analysis:
• Nodes/Vertices • Hubs
• Edges/Connection • Authorities
• Degree • Community detection
• Edge density
• Closeness (centrality)
• Betweenness (centrality)
• Edge_betweenness etc.
Social Networks:
https://study.com/academy/lesson/what-are-social-networks-types-examples-quiz.html

• Social networks are simply • If you're on Facebook, keep in mind


networks of social interactions and that so are 1.15 billion? other
personal relationships. Think about people throughout the world.
your group of friends and how you
got to know them.
• Maybe you met them in • In fact, 72% of all Internet users are
elementary school, or maybe you active on social media today,
met them through a hobby or indulging in social interactions and
through your community. developing personal relationships.
• Either way, you were exposed to
social networks: meeting other • But you don't always have to go
individuals in a social situation, online to be exposed to social
while developing strong personal networks, as they come in a
bonds over time. multitude of formats.
Why Should I Care About Social Network Analysis?
https://towardsdatascience.com/how-to-get-started-with-social-network-analysis-6d527685d374

• Social network analysis • Networks are all around us — such


(SNA), also known as as road networks, internet
networks, and online social
network science, is a field of networks like Facebook, Twitter …
data analytics that uses • Learning SNA and its techniques
networks and graph theory will give you valuable tools to
to understand social provide insight on a variety of data
structures. sources.
• SNA techniques can also be • In order to build SNA graphs, two
key components are required:
applied to networks outside actors and relationships.
of the societal realm.
SNA graph:
• A social network graph contains
both points and lines connecting
those dots — similar to a
connect-the-dot puzzle.

• The points represent the actors


and the lines represent the
relationships.

• The shaded area is “community”


SNA: Networks and Graph theory
https://en.wikipedia.org/wiki/Social_network_analysis
• Social network analysis (SNA) is • The advantages of SNA are
the process of investigating social twofold. Firstly, it can process a
structures through the use large amount of relational data and
of networks and graph theory. describe the overall relational
network structure.
• It characterizes networked • It can also select term and
structures in terms parameter to confirm the
of nodes (individual actors, people, influential nodes in the network,
or things within the network) and such as in-degree and out-degree
the ties, edges, centrality.
or links (relationships or • Through analyzing nodes, clusters
interactions) that connect them. and relations, the communication
structure and position of
individuals can be clearly described
Discussion on “How to do SNA Guide?”
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/491572/socnet_howto.pdf

• The aim of social network analysis • A network is simply a number of


is to understand a community by points (or ‘nodes’) that are
mapping the relationships that connected by links.
connect them as a network, and • Generally in social network
then trying to draw out key analysis, the nodes are people and
individuals, groups within the the links are any social connection
network (‘components’), and/or between them – for example,
associations between the friendship, marital/family ties, or
individuals. financial ties.
• SNA for detecting network of
gangs (of criminals)
How SNA is used to analyze “gang” network?
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/491572/socnet_howto.pdf

• Social network analysis can • The technique will generate


provide information about the diagrams that will show the
reach of gangs, the impact of relationships between
gangs, and gang activity. individuals that are contained in
your data, this could include:
criminal links, social links,
• The approach may also allow potential feuds, etc.
you to identify those who may
be at risk of gang-association • SNA diagrams can include
and/or being exploited by gangs. names, pictures and further
details of individuals as required.
Key network statistics 1:
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/491572/socnet_howto.pdf
Key network statistics 2:
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/491572/socnet_howto.pdf
SNA Basics:
https://www.youtube.com/watch?v=0xsM0MbRPGE
library(igraph)
g <- graph(c(1,2))
plot(g)

• First node contains 1


• Second node contains 2

• The arrow (edge) goes from 1 to


2 as we defined that way in g!
SNA Basics: Changing size and color of node
(vertex) and edge
plot(g,
vertex.color = “green”,
vertex.size = 40,
edge.color = “red”,
edge.size = 20)

Note: Here information (email,


twitter following, gang following)
is flowing from 1 to 2!
SNA Basics: Adding more data points
g <- graph(c(1,2,2,3,3,4,4,1)
plot(g,
vertex.color = “green”,
vertex.size = 40, Email or
edge.color = “red”, Twitter or
gang
edge.size = 20) following

Note: This is a directed graph as


we can see “arrow” here.
SNA Basics: Undirected data points
g <- graph(c(1,2,2,3,3,4,4,1),
directed = F)
plot(g,
vertex.color = “green”,
vertex.size = 40, Facebook friends
or gang members!
edge.color = “red”,
edge.size = 20)
Note: This is not a directed graph
as we cannot see “arrow” here.
SNA Basics: Adding related & unrelated nodes
g <- graph(c(1,2,2,3,3,4,4,1),
directed = F, n=7)
plot(g,
vertex.color = “green”,
vertex.size = 40,
edge.color = “red”,
edge.size = 20)
Note: Three unrelated nodes are
shown without links.
SNA Basics: Adding related & unrelated nodes
g[] 7 x 7 sparse Matrix of class
"dgCMatrix"
This will give us the matrix used to
produce the earlier graph [1,] . 1 . 1 . . .
[2,] 1 . 1 . . . .
The dimension of this matrix is 7x7 [3,] . 1 . 1 . . .
[4,] 1 . 1 . . . .
The dot(.) means no relation [5,] . . . . . . .
(connection) and 1 mean the [6,] . . . . . . .
connection with the nodes e.g. 1 has
connection with 2 and 4 [7,] . . . . . . .
SNA Basics: Defining nodes with text data
g1 <-
graph(c("Sita","Ram","Ram","Rita"
,"Rita","Sita","Sita","Rita", "Anju",
"Ram"))
plot(g1,
vertex.color = "green",
vertex.size = 40,
edge.color = "red",
edge.size = 5)
SNA Basics: Getting info of “g1”
g1 Output in R:

D=Directed, N=Names IGRAPH 0adac86 DN-- 4 5 --


4 = Four vertices (nodes) + attr: name (v/c)
5 = Five edges (lines) + edges from 0adac86 (vertex names):
[1] Sita->Ram Ram ->Rita Rita->Sita
Pairs: Sita->Ram Sita->Rita Anju->Ram
Ram->Rita
Rita->Sita
Sita->Rita
Anju->Ram
SNA Basics: Getting degrees of “g1”
degree(g1) or degree(g1, mode=“all”)
Sita Ram Rita Anju
3 3 3 1
degree(g1, mode=“in”)
Sita Ram Rita Anju
1 2 2 0
degree(g1, mode=“out”)
2 1 1 1
“degree” means = Number of
connections for each node
SNA Basics: Getting diameter of “g1”
#Diameter
diameter(g1, directed = F, weights =
NA)
[1] 2
“diameter” means = number of
edged inside and outside of SND
i.e. Anju -> Ram and Ram -> Rita
Or Anju -> Ram and Ram -> Sita
SNA Basics: Getting edge density of “g1”
#Edge density
edge_density(g1, loops = F)
[1] 0.4166667

#Edge density
ecount(g1)/(vcount(g1)*(vcount(g1)
-1))
5/4*(4-1)
[1] 0.4166667
SNA Basics: Getting reciprocity of “g1”
#Reciprocity of directed graph
#Percentage reciprocated ties
reciprocity(g1)
[1] 0.4

Total edges = 5
Tied edges = 2
Reciprocity = 2/5 = 0.4
SNA Basics: Getting closeness of “g1”
#Closeness
closeness(g1, mode = "all", weights = NA)

Sita Ram Rita Anju


0.2500000 0.3333333 0.2500000 0.2000000

Ram is closest to other three persons


Anju is farthest to other three persons
SNA Basics: Getting betweenness of “g1”
#Betweenness
betweenness(g1, directed = T, weights
= NA)
Sita Ram Rita Anju
1 2 2 0
Ram and Rita have two “inner” edges,
Sita has 1 and Anju has 0!

edge_betweenness(g1, directed = T,
weights = NA)
24413 #Learn on your own!
Question/queries so far?
• More are here: https://igraph.org/r/html/latest/
Self-Practice: SNA with “SNA_School.csv” data file
Follow this video: https://www.youtube.com/watch?v=0xsM0MbRPGE
first second grade spec
#Read the data in R AA
AB
DD
DD
6
6
Y
R

data <- read.csv(file.choose(), AF


DD
BA
DA
6
6
Q
Q
header=T) CD
DD
EC
CE
6
6
X
Y

#Save the first two columns as y CD


CD
FA
CC
6
6
X
W
BA AF 6 R
y <- data.frame(data$first, CB CA 6 T

data$second) CC
CD
CA
CA
6
6
U
Q
BC CA 6 U
#Save it as network graph data DD DA 6 Y
ED AD 6 R
net <- graph.data.frame(y, AE AC 6 Z
AB BA 6 Y
directed=T) CD EC 6 X
CA CC 6 U
SNA with a data file: networkdata.csv
#Vertices – 52 unique vertices • + 52/52 vertices, named, from
V(net) 58abab2:
#Edges – 290 edges • [1] AA AB AF DD CD BA CB CC
BC ED AE CA EB BF BB AC DC BD
E(net) DB CF DF BE EA CE EE EF
#Names as labels • [27] FF FD GB GC GD AD KA KF
V(net)$label #Result = NULL LC DA EC FA FB DE FC FE GA GE
#Define the labels KB KC KD KE LB LA LD LE
V(net)$label <- V(net)$name
V(net)$label # 52 vertices as labels
SNA with a data file: networkdata.csv
#Define degree • [1] 18 9 23 36 40 26 24 50 21 27 15
V(net)$degree #Result = NULL 62 7 12 23 27 2 4 8 12 23 20 8 10 6
8
V(net)$degree <- degree(net)
• [27] 1 8 1 1 1 9 3 3 1 7 3 1 1 2
V(net)$degree 1 2 5 1 1 1 1 1 1 1 1 1
• [1] "AA" "AB" "AF" "DD" "CD" "BA"
What does it means here? "CB" "CC" "BC" "ED" "AE" "CA" "EB"
"BF" "BB" "AC" "DC" "BD" "DB" "CF"
Number of connections for each nodes "DF" "BE" "EA" "CE" "EE" "EF"
(vertices)
• [27] "FF" "FD" "GB" "GC" "GD" "AD"
"KA" "KF" "LC" "DA" "EC" "FA" "FB"
table(degree(net)) ??? "DE" "FC" "FE" "GA" "GE" "KB" "KC"
"KD" "KE" "LB" "LA" "LD" "LE"
Histogram of node degree i.e. connections
• #Histogram of node degree
• hist(V(net)$degree,
col = "green",
main = "Histogram of node degree",
ylab = "Frequency",
xlab = "Degree of Vertices")
Network diagram:
• set.seed(222)
• plot(net)
Network diagram: A bit of tweaking!
• plot(net,
• vertex.color = "green",
• vertex.size = 2,
• vertex.label.dist = 1.5,
• edge.arrow.size = 0.1,
• vertex.label.cex = 0.8)
Network diagram: A little bit of tweaking!
• plot(net,
• vertex.color = "green",
• vertex.size = 2,
• edge.arrow.size = 0.1,
• vertex.label.cex = 0.8)
Network diagram: layout 1!
plot(net,
vertex.color = rainbow(52),
vertex.size = V(net)$degree*0.4,
edge.arrow.size = 0.1,
layout=layout.fruchterman.reingold)
#Next layout i.e. layout 2
plot(net,
vertex.color = rainbow(52),
vertex.size = V(net)$degree*0.4,
edge.arrow.size = 0.1,
layout=layout.kamada.kawai)

Which nodes are “hubs”?


• Nodes with most outer edges
• We need “hub score”

Which nodes are “authorities”?


• Nodes with most inner edges
• We need “authority score”
Hubs and authorities: With hub score &
authority scores of the network data

You need to
watch the
youtubue
video and
replicate this
graph using
the codes
provided
there!

https://www.youtube.com/watch?v=0xsM0MbRPGE
Community (cluster) detection:
#Community detection
net <- graph.data.frame(y, directed = F)
cnet <- cluster_edge_betweenness(net)
plot(cnet,
net,
vetex.size = 10,
vertex.label.cex = 0.8)

cluster_edge_betweenness is the function


available in the igraph package to fit the
clustering algorithm!

More are available here:


https://igraph.org/r/html/latest/
Resources:
• Read and learn about “sna” package on your own! Also check the reading
on facebook social network analysis and try to do the same with your
own facebook data!

• SNA statistics: https://www.latentview.com/blog/a-guide-to-social-


network-analysis-and-its-use-cases/

• Application of SNA:
• https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-019-1599-6
• https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0264-6
Question/Queries?
Next class:
• Grammar of graphics

• ggplot2 packages and its use in R

• Read Chapter 3: Data Visualization with ggplot2 of your course text


book carefully before coming to the next class

• Link: https://r4ds.had.co.nz/data-visualisation.html
Thank you!
@shitalbhandary

You might also like