Social Network Analysis Exam Guide
Social Network Analysis Exam Guide
Explain the topics " Introduction to Social Web, Nodes, Edges and Network Measures, Describing
Nodes and Edges,
Describing Networks, Layouts, Visualizing network features" with respect to Social Network
Analysis for semester college exams. Explain broadly and theoretically covering each topic and
avoid any kind of filler content
Below is a broad, complete, theory-rich explanation of all the topics for your Social Network Analysis
(SNA) exam. Each topic is written in a clean, academic, semester-exam style, without filler content, and
can be directly used for 8–10 mark long answers.
Information diffusion
Influence and opinion formation
Community structure
Recommendation systems
Behavioral patterns
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/46
Thus, SNA helps to mathematically model and analyze these interactions using graph theory and
statistical methods.
Individuals on Facebook
Accounts on Twitter
Each node may have attributes like name, age, location, follower count, etc.
Network Measures
1. Degree-based measures
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/46
PageRank: Variant of eigenvector centrality used by Google.
3. Network-level measures
4. Clustering coefficient
Network measures help in identifying important nodes, community structures, efficiency of information
spread, and the robustness of networks.
Structural properties
Degree
Centrality scores
Connectivity
Attribute properties
Describing nodes is essential to identify key actors, leadership roles, vulnerable nodes, and
communities.
Describing Edges
1. Direction
Directed (follows, subscribes, cites)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/46
Undirected (friendship, collaboration)
2. Weight
4. Temporal nature
Static (fixed relationship)
Dynamic (time-evolving interactions)
5. Type of relationship
Social relationship (friendship)
Communication (email)
Collaboration (co-authorship)
Interaction (likes, comments)
Describing edges helps understand how strong or weak ties shape network behavior and diffusion
processes.
4. Describing Networks
A network can be described using its topology, size, connections, and structural properties.
1. Size
Number of nodes (N)
Number of edges (E)
2. Density
Measure of how many edges exist relative to maximum possible edges.
3. Network Type
Directed / Undirected
Weighted / Unweighted
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/46
Communities indicate shared interests, groups, or functional units.
6. Diameter and Path Length
Small-world effect in social networks indicates short path lengths between any nodes.
7. Components
Connected components or isolated sub-networks.
Through these properties, networks can be compared, classified, and analyzed for patterns.
5. Layouts
Layouts refer to the visual arrangement of nodes and edges when representing a network graphically.
Layouts help in visually identifying:
Clusters
Types of Layouts
1. Force-directed layouts
Nodes repel each other, edges act like springs.
4. Random layout
Nodes placed randomly, mainly for debugging or comparison.
5. Grid layout
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/46
Network visualization is the process of representing nodes, edges, and their attributes through
graphical interfaces to interpret structure and patterns.
Visualization helps identify:
Central nodes
Community clusters
Edge density
Interaction patterns
Structural holes
Strong vs. weak ties
Visualization Techniques
1. Node Color
Represents attributes like community, category, or influence.
2. Node Size
4. Edge Color
6. Temporal visualization
Shows network evolution over time using animations or time-sliced graphs.
Gephi
Cytoscape
NetworkX (Python)
Pajek
NodeXL
Effective visualization makes large, complex networks easier to interpret for decision-making and
research.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/46
If you want, I can also prepare:
✅ 10-mark long answers for each topic
✅ 5-mark short answers
✅ Condensed 1-page exam notes
✅ Diagrams or labeled network examples
Just tell me what format you need.
Below is a broad, theory-focused, exam-ready explanation of all topics from Social Network Analysis
(SNA). Everything is written in a clean academic style for 8–10 mark long answers with no filler content.
1. Information Flow:
Strong ties (close friends, family) enable high-trust, rich information exchange.
Weak ties (acquaintances) enable access to new, diverse information from different social circles.
2. Network Cohesion:
Strong ties create tightly knit clusters, while weak ties bridge different clusters and reduce
fragmentation.
4. Community Formation:
Strong ties form dense social groups; weak ties connect these groups to broader networks.
5. Structural Holes and Social Capital:
Individuals connected by weak ties can bridge structural holes, gaining strategic advantages such as
access to new opportunities.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/46
Thus, tie strength shapes network topology, behavior, communication patterns, and diffusion processes.
1. Frequency of Interaction:
Number of messages, calls, comments, or shared activities.
2. Duration of Relationship:
Length of time individuals have known each other.
3. Reciprocity:
Balanced communication (mutual commenting, liking, replying) indicates stronger ties.
4. Emotional Intensity:
Often inferred from content sentiment, message length, or degree of personal communication.
5. Intimacy / Trust:
Closeness or personal nature of communication.
6. Structural Features:
Number of mutual friends
Overlap of neighborhoods
Co-participation in events
Similarity of interests
Tie strength can be quantified as a weighted sum of multiple features or through supervised learning
models.
1. Strong-Tie Subnetworks:
High clustering coefficient
2. Weak-Tie Bridges:
Connect otherwise disconnected clusters
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/46
Multiple types of relationships between the same individuals (e.g., colleagues + friends)
These structures help identify influencers, community boundaries, and communication bottlenecks.
3. Network Propagation
Network propagation refers to the process through which information, behaviors, opinions, or
innovations spread across a network.
Modes of Propagation
1. Simple Contagion:
Information spreads through single contact events (e.g., viral news).
2. Complex Contagion:
Adoption requires reinforcement from multiple neighbors (e.g., joining a movement).
3. Threshold Models:
A node adopts a behavior only if the proportion of active neighbors exceeds a threshold.
4. Epidemic Models:
Based on disease spread models like SIR and SIS.
Tie strength: strong ties increase reinforcement; weak ties increase reach.
Propagation analysis is used in social media marketing, misinformation tracking, epidemiology, and viral
content prediction.
4. Link Prediction
Link prediction aims to predict missing, future, or potential edges in a network based on existing
structure and patterns.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/46
Applications
Fraud detection
Knowledge graph completion
A. Similarity-Based Methods
1. Common Neighbors:
More mutual friends → higher likelihood of connection.
2. Jaccard Coefficient:
Ratio of shared neighbors to total neighbors.
3. Adamic–Adar Index:
Weighs shared neighbors based on rarity.
4. Preferential Attachment:
Nodes with high degree tend to acquire more links.
B. Path-Based Methods
Katz index, Shortest path scores, and Random walk algorithms measure closeness via paths.
Logistic regression
Graph embedding models
Link prediction helps identify evolving relationships and potential structural changes in future networks.
5. Entity Resolution
Entity Resolution (ER) refers to the process of identifying when multiple references, records, or
profiles refer to the same real-world entity, especially across noisy or incomplete social data.
Importance in SNA
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/46
1. Data Matching:
Compare attributes like name, username, email, phone number.
2. Similarity Computation:
Text similarity, structural similarity, profile overlaps.
3. Contextual Matching:
Common friends, interaction patterns, geolocation.
4. Classification or Clustering:
Machine learning models classify whether two records are the same.
5. Merging:
Resolve duplicates and create a clean unified node.
Challenges
Privacy restrictions
To analyze how a news topic spreads in a social media network using SNA techniques.
Dataset
Methodology
1. Network Construction:
Nodes represent users; edges represent follower/followee relations.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/46
Degree centrality to identify key influencers
Clustering coefficient to detect communities
Weak ties identified as bridges between clusters
4. Network Propagation Study:
Retweet cascades analyzed using temporal data.
Propagation paths mapped to find influential spreaders.
5. Link Prediction:
Future follower relations predicted using common neighbors and preferential attachment.
6. Entity Resolution:
Duplicate user profiles or bots identified and removed using similarity-based matching.
Findings
Conclusion
SNA helps understand how information moves, who controls influence, and how network structure
shapes communication dynamics.
Explain the topics " Introduction to community discovery, communities in context, quality
functions, The
Kernighan-Lin algorithm, Agglomerative algorithms, spectral algorithms, multi-level graph
partitioning, Markov clustering, Other approaches" with respect to Social Network Analysis for
semester college exams. Explain broadly and theoretically covering each topic and avoid any kind
of filler content
Below is a broad, theory-rich, exam-focused explanation of all topics from Community Discovery in
Social Network Analysis. The content is written in a clean academic style suitable for 8–10 mark long
answers, with no filler content.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/46
SOCIAL NETWORK ANALYSIS — COMMUNITY DISCOVERY
Essential for biological networks (protein complexes), web graphs, and social media analytics.
Community discovery is central to SNA because most real-world social networks exhibit clustered,
modular, and hierarchical structures rather than random patterns.
2. Communities in Context
Communities are interpreted based on their network context, i.e., the nature of the graph and the type
of relationships it represents.
Types of Communities
1. Disjoint Communities:
Each node belongs to only one community. Common in traditional clustering.
2. Overlapping Communities:
Nodes may belong to multiple communities (e.g., a person in different social circles).
3. Hierarchical Communities:
Communities contain sub-communities at multiple levels (tree-like structure).
4. Dynamic Communities:
Communities evolve over time in temporal networks.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/46
Understanding context ensures that detected communities align with meaningful real-world structures.
3. Quality Functions
Quality functions are mathematical metrics that evaluate the “goodness” of the communities discovered.
They help compare and validate different partitions of a network.
A. Modularity (Q)
B. Conductance
Measures the ratio of edges leaving a community to total edges within it.
Lower conductance → better-defined community.
C. Cut Ratio
Ratio of edges crossing between two sets of nodes relative to possible edges.
D. Density
Quality functions guide algorithms to ensure meaningful communities with high internal connectivity
and low external interaction.
Key Features
Iteratively swaps node pairs across partitions if doing so reduces edge-crossing cost.
Continues until no improvement is possible.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/46
Algorithm Steps
Characteristics
5. Agglomerative Algorithms
Agglomerative algorithms follow a bottom-up hierarchical clustering approach.
Approach
1. Single-link clustering:
Merges communities with the smallest edge distance.
2. Complete-link clustering:
Ensures that merged communities are tightly connected.
3. Average-link clustering:
Considers average similarity between communities.
4. Modularity-based agglomeration (e.g., Louvain first phase):
Iteratively combines communities that increase modularity.
Advantages
Agglomerative algorithms are simple but can be computationally expensive for very large networks.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 15/46
6. Spectral Algorithms
Spectral algorithms use eigenvalues and eigenvectors of matrices associated with graphs (such as the
Laplacian matrix) to detect community structure.
Working Principle
Key Insight
Applications
Spectral methods are mathematically elegant and effective but require eigen decomposition, which may
be costly for huge networks.
Three-Phase Process
1. Coarsening Phase:
Gradually reduce the graph size by merging nodes/edges.
Advantages
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 16/46
Produces globally good partitions.
Examples include Metis, Louvain, and Leiden methods.
This method is widely used in real-life applications that require fast community detection on massive
networks.
Principle
Core Operations
1. Expansion:
Simulates random walks by taking higher powers of the adjacency matrix.
2. Inflation:
Strengthens strong connections and weakens weak ones by applying a power-raising and
normalization process.
Advantages
MCL is considered robust and easy to control via the inflation parameter, which influences granularity.
9. Other Approaches
Various advanced or alternative community detection techniques include:
A. Label Propagation
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 17/46
Detects overlapping communities using k-cliques.
A community is formed by adjacent cliques sharing k–1 nodes.
These approaches extend the flexibility of community discovery to fit various network shapes, sizes, and
application domains.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 18/46
Below is a broad, theory-rich, exam-oriented explanation of all topics from Social Influence in Social
Network Analysis (SNA). The content is written in a clean academic style suitable for 8–10 mark long
answers, with no filler content.
Key Characteristics
Social similarity
SNA treats influence as a quantifiable and measurable process, important for applications like viral
marketing, opinion mining, and behavioral prediction.
2. Influence-Related Statistics
To study social influence rigorously, various statistical measures and models are used. These help
quantify how much influence one node may exert on another.
A. Correlation of Actions
Measures similarity in behavior between connected nodes (e.g., number of times they adopt the same
action).
B. Conditional Probabilities
Probability that a node performs an action given that its neighbor performed it earlier.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 19/46
C. Influence Probability
D. Exposure Count
Number of neighbors who have already adopted a behavior before a node adopts it.
Measures the time-lag between a neighbor performing an action and the node following it.
These statistics are used in influence models like Independent Cascade (IC) and Linear Threshold (LT).
Types of Similarity
4. Homophily
Homophily is the tendency of individuals to associate and bond with others who are similar to
themselves.
Types of Homophily
1. Status Homophily:
Based on sociodemographic attributes (age, gender, education, location).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 20/46
2. Value Homophily:
Based on shared beliefs, interests, values.
Homophily complicates influence analysis because observed behavior similarity may arise from
similarity rather than influence.
Purpose
To determine whether social influence is actually present or whether the observed behavior is due to:
Homophily
Independent decisions
External factors
Outcome
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 21/46
6. Influence and Actions
This topic studies how social influence shapes individual actions, such as:
Joining a group
Purchasing a product
Sharing or liking content
Participating in events
Key Concepts
1. Behavioral Adoption
4. Exposure Effect
More exposures to an action → higher likelihood of adoption.
Actions are analyzed based on temporal sequences, exposure counts, and diffusion patterns.
1. Direct interactions:
Messages, replies, comments.
2. Indirect interactions:
Likes, views, retweets.
3. Repeated interactions:
Increase trust and tie strength.
4. Reciprocal interactions:
Two-way interactions indicate stronger influence potential.
Effects on Influence
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 22/46
Interaction frequency predicts likelihood of action adoption.
Network propagation depends heavily on active interaction links.
Influence is not solely based on network structure; interaction dynamics determine real diffusion.
Importance
Problem Definition
“Given a network, find k seed nodes that maximize expected spread of influence.”
Models Used
Heuristics
Outcome
Give me all numericals related to this subject, with formulas and example
Great — below are concise, exam-ready numericals covering the core quantitative computations you’ll
encounter in Social Network Analysis. For each item I give the formula, a small example network /
data, and the step-by-step worked solution so you can reproduce it in an exam. I focused on the most
important measures and algorithms students are usually tested on.
2. Graph Density
2E
Formula (undirected): Density = where E edges, N nodes.
N (N − 1)
= 4, edges E = 4.
Example: Same graph above: N
2×4 8
Solution: Density = = = 0.6667.
4×3 12
3 × number of triangles
Global (average) clustering: average of local C(v) or transitivity: .
number of connected triples
Example: Use previous graph. Triangles: only one triangle A-B-C → 1 triangle.
2×1 2
Local C(C): kC = 3. Triangles through C =1. So C(C) =
= = 0.3333.
3×2 6
2×1
Local C(A): kA = 2. Triangles through A =1 (A-B-C). So C(A) = = 1.0.
2×1
( ) 2
σst
Example: Same graph. Consider node C . Pairs (A,D): shortest paths A–C–D (unique) → counts for C.
(B,D): B–C–D (unique). (A,B): path A–B (doesn't include C). (A,C),(B,C) trivial. So σAD (C) = 1, σBD (C) =
1. Total possible ordered unordered pairs excluding v: (32) = 3 pairs among A,B,D: (A,B),(A,D),(B,D). Only
(N −1)(N −2)
two use C. So unnormalized = 2. Normalized (for undirected graphs) often divide by 2 =
3×2
2
= 3: CB(C) = 2/3 = 0.6667.
6. Closeness Centrality
d(u, v)
(You may be asked to compute power iteration once: start x=[1,1,1], multiply A, normalize → shows
convergence.)
N
u∈Γ (v) in
Example: 3-node directed cycle A→B→C→A. Set d = 0.85. Start with uniform PR=1/3 ≈0.3333. One
iteration:
For A: PR(A) = 0.05 + 0.85*(PR(C)/outdeg(C)). Outdeg(C)=1, PR(C)=0.3333 →
PR(A)=0.05+0.85*0.3333=0.05+0.2833=0.3333 (stays same). So steady state uniform PR=1/3.
If graph had a sink, iterative steps would change values; exam questions often ask one/two iterations—
show calculations like above.
Example: Nodes P and Q. N (P ) = {A, B, C}, N (Q) = {B, C, D, E}. Intersection {B, C} size 2.
Union {A, B, C, D, E} size 5.
J(P , Q) = 2/5 = 0.4.
l=1
Example (3-node chain 1–2–3): Paths of length 1 between 1 and 2 =1; length 2 between 1 and 3 =1.
Compute Katz(1,3) with β = 0.1: only length-2 path exists → Katz(1,3)=β 2 ⋅ 1 = 0.01.
2m 2m
ij
Example: Small graph: A-B-C triangle (3 nodes fully connected) plus isolated D connected to C only? Let's
take 4-node graph with edges: AB, AC, BC (triangle among A,B,C) and CD (edge C–D). So edges m=4.
2m
m
community.
For community1: internal edges l1 = 3 (AB,BC,AC). d1 = k_A+k_B+k_C = 2+2+3=7. For community2: l2 =0,
d2 = k_D =1.
So Q = (3/4) − (7/8)2 + (0/4) − (1/8)2 . Compute: 3/4=0.75. (7/8)^2 = 49/64 = 0.765625. First term:
0.75 - 0.765625 = -0.015625. Second: 0 - (1/64)= -0.015625. Total Q = -0.015625 -0.015625 = −0.03125.
Negative modularity → this partition is worse than random baseline (expected because D ties to C).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 27/46
14. Conductance of a Community
cut(S, Sˉ)
Formula: ϕ(S) = where cut = number edges leaving S, vol = sum degrees in S.
min(vol(S), vol(Sˉ))
Example: Use community S={A,B,C} from previous example. cut = edges from S to outside = edge C–D
=1. vol(S) = d1 =7. vol(bar S)=d2=1. min =1. So ϕ(S) = 1/1 = 1.0. Very high conductance (bad
community separation).
external cost − internal cost, and wab = 1 if edge between a and b else 0.
Example: Partition sets A={a1,a2}, B={b1,b2} on a small graph. Suppose D(a1)=2, D(b1)=1, w_{a1b1}=0.
Then gain = 2+1-0 = 3. So swapping a1 and b1 improves cut by 3 (units).
0 −1 1
signs) → partition {1} and {3} with node2 ambiguous. This shows spectral split.
Example: 3-node triangle A–B–C (fully connected). Adjacency with self-loops for MCL often: add self
loops, adjacency all ones (including diagonals) → adjacency matrix with 1s everywhere 3x3. Column
stochastic M = each column sums to 3 → each entry 1/3.
Expansion: M 2 = M because uniform. Inflation r=2: raise each element to power 2 → (1/3)^2=1/9.
Normalize columns: each column sum = 3*(1/9)=1/3 → normalized entry = (1/9)/(1/3)=1/3. So stable fixed
point; MCL finds single cluster for triangle.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 28/46
Example: Graph A—B—C (chain). Seed = {A}. Uniform edge activation probability p=0.5.
Find expected number of activated nodes (expected cascade size).
Stepwise:
∑u wuv
≤ 1. Node activates when sum of active neighbors’ weights ≥ θv .
Example: Triangle A,B,C. Weights uniform: each incoming weight = 0.5 (since two neighbors?). For node
C: weights from A and B 0.5 each. Threshold θC = 0.6. Seed set {A}. Will C activate?
Process: Initially A active. Sum of active neighbors for C = weight from A = 0.5 < 0.6, so C does not
activate. If later B activates, then C sees total 1.0 ≥0.6 and activates. So single seed A insufficient to
activate others unless threshold ≤0.5.
(Exam often asks such calculations to show central nodes better for spread.)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 29/46
Example: Suppose f = 10 messages, t = 30 days, mutual friends=5. Normalize values to [0,1]: max
observed f=20 → f_norm=0.5; max t=60 → t_norm=0.5; max mutual friends=10 → m_norm=0.5. Coeffs
α=0.5, β=0.3, γ=0.2. Then s = 0.5 ∗ 0.5 + 0.3 ∗ 0.5 + 0.2 ∗ 0.5 = 0.25 + 0.15 + 0.10 = 0.5.
For matrix-based measures show adjacency matrix A and degree matrix D when needed.
For centrality counts (betweenness) list all shortest paths (pairwise) and count occurrences.
For probabilistic diffusion (IC), compute expected contributions node-by-node (linearity of
expectation).
For modularity use the lc /m − (dc /2m)2 shortcut per community — faster and less error-prone.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 30/46
If you want, I can now:
Convert these into 30+ practice numerical questions (with blanks for you to solve and separate
answer key), or
Produce a one-page formula sheet with the same worked examples compacted for quick revision.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 31/46
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 32/46
Answer this question paper with all relevant answers and calculations for numericals
Nice — I’ll solve the whole question paper for you. For the diagram-based numericals I explicitly state
the small assumptions I made about the edges (so you can verify or correct them). If any diagram is
different from what I assumed, tell me which edge(s) differ and I’ll immediately recompute (I’ll re-
draw/compute only after you confirm).
I keep the style exam-ready: short answers for Group-A, clear steps for numericals and worked answers
for short/long questions.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 33/46
GROUP A — Very short answers
(Answer any ten — I'll give 12 short answers; you can pick any 10 in your answer sheet.)
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 34/46
GROUP B — Short / Numerical questions (Q2–Q11)
I'll answer questions 2–11 in order as they appear. Where diagrams are involved I give the assumption I
used.
1. Disjoint (non-overlapping) communities: each node belongs to exactly one community. E.g.,
partition of students by department.
2. Overlapping communities: nodes belong to multiple groups (e.g., a person in family + workplace
groups).
3. Hierarchical communities: communities contain subcommunities (e.g., company → departments →
teams).
4. Core–periphery structure: dense core nodes, sparse periphery nodes (e.g., research group core vs
external collaborators).
5. Temporal/dynamic communities: community membership changes over time (e.g., event-based
groups on social media).
Triadic closure: tendency for two nodes with a common neighbor to become connected (closing of
triangles). Driven by network structure and opportunities to meet. Example: if A friends with B and
C, B and C likely to become friends.
Difference: Homophily is attribute-based similarity producing ties; triadic closure is structural (common
neighbors) producing ties. Homophily can produce assortative mixing by attributes; triadic closure
produces high clustering and triangles.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 35/46
Degrees:
deg(C)=4 (A,B,D,E)
deg(A)=2 (C,B)
deg(B)=2 (C,A)
deg(D)=1 (C)
deg(E)=1 (C)
This is a typical exam figure where B and D share common neighbors {C} and possibly others. I will
compute Adamic-Adar(B,D) under the assumption that common neighbors = {C} only. (If your diagram
has additional common neighbors, replace accordingly.)
Step 1: find common neighbors: N (B) ∩ N (D) = {C}.
Step 2: compute Adamic–Adar:
1
AA(B, D) = ∑ .
log ∣N (w)∣
1
AA(B, D) = .
log 4
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 36/46
1
AA(B, D) ≈ ≈ 0.7213.
1.3863
If your diagram shows additional common neighbors for B and D (e.g., A or E), include each
term 1/ log(∣N (w)∣) for those w and sum.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 37/46
8. Implicit vs explicit ties: explicit (declared friendship), implicit (inferred from co-views or co-clicks).
9. Strong vs weak ties: strong = close friends (frequent, trusted), weak = acquaintances (rare contact)
— Granovetter’s concept.
For each type give one short example (above included). That’s the required coverage.
A connected to B, C, I
B connected to A, C, D
C connected to A, B, D, E
D connected to B, C, E, G
E connected to C, D, F
F connected to E, G
G connected to D, F, H, I
H connected to G, I
I connected to A, G, H
(This is one plausible reconstruction of the exam figure — it captures an almost-grid shape with triangles as
typical in exam sheets.)
For node v in an undirected network with N nodes, normalized degree centrality is:
deg(v)
CD (v) = .
N −1
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 38/46
css
A B C D E F G H I
A [ 0 1 1 0 0 0 0 0 1 ]
B [ 1 0 1 1 0 0 0 0 0 ]
C [ 1 1 0 1 1 0 0 0 0 ]
D [ 0 1 1 0 1 0 1 0 0 ]
E [ 0 0 1 1 0 1 0 0 0 ]
F [ 0 0 0 0 1 0 1 0 0 ]
G [ 0 0 0 1 0 1 0 1 1 ]
H [ 0 0 0 0 0 0 1 0 1 ]
I [ 1 0 0 0 0 0 1 1 0 ]
deg(A) = 3 (B,C,I)
deg(B) = 3 (A,C,D)
deg(C) = 4 (A,B,D,E)
deg(D) = 4 (B,C,E,G)
deg(E) = 3 (C,D,F)
deg(F) = 2 (E,G)
deg(G) = 4 (D,F,H,I)
deg(H) = 2 (G,I)
deg(I) = 3 (A,G,H)
CD (F ) = 2/8 = 0.25
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 39/46
Q9. (a) What is the minimum cut problem in group-centric
community detection? (b) What measures are used to mitigate
min-cut problem and provide formulas? (c) Find ratio cut and
normalized cut for cut A and cut B in given network. [2+5+8 =15
marks]
This is three subparts.
The minimum cut problem partitions the graph into two (or more) groups while minimizing the number
(or total weight) of edges crossing between groups. In community detection, a raw min-cut often yields
unbalanced trivial solutions (e.g., isolate a single node) that are not meaningful communities.
cut(S,Sˉ)
or for two-way cut sometimes written as ∣S∣
(choose consistent definition). Here
cut(S, Sˉ) is
number of edges between S and complement.
2. Normalized Cut (Ncut): normalizes by volumes (sum of degrees) to account for degree distribution:
cut(S, Sˉ)
ϕ(S) =
min(vol(S), vol(Sˉ))
1 ki kj
Q= ∑ (Aij − )δ(ci , cj )
2m 2m
ij
These measures favor balanced or high-quality communities rather than trivial small cuts.
(c) Compute ratio cut and normalized cut for straight A and B cuts (8 marks)
Figure in the exam (I recall) shows a small graph with nodes labeled 1..9 arranged in two columns,
with a dashed line representing cut B and another representing cut A. I will assume the paper’s diagram
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 40/46
corresponds to the typical exam figure:
Nodes and edges (reconstructed assumption — matches the typical figure):
Because the diagram is ambiguous, I will instead explain the exact method to compute ratio cut and
normalized cut and demonstrate with a small concrete numeric example extracted from the exam-like
tiny graph. That gives full marks for method and a worked example.
Worked example (concrete): Consider a graph of 9 nodes where edges crossing between S and
complement are explicitly: suppose Cut A separates S={5,8,9} from rest and cut edges count = 3.
Suppose |S|=3 and |bar S|=6. Suppose degrees sum (volumes) vol(S)=10 and vol(bar S)=14. (These are
plausible numbers for a small exam figure.) Then:
cut(S, Sˉ) = 3.
3 3
RatioCut = + = 1 + 0.5 = 1.5.
3 6
Normalized cut:
3 3
Ncut = + = 0.3 + 0.2143 = 0.5143.
10 14
If you want a precise numeric answer for the exam figure: please confirm the exact node partition
sets S for cuts A and B and the degrees or provide the adjacency of the figure. I’ll compute ratio cut and
Ncut numerically in one step.
∣N (u) ∩ N (v)∣
J(u, v) =
∣N (u) ∪ N (v)∣
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 41/46
Example: N (u) = {A, B, C}, N (v) = {B, C, D, E}. Intersection size = 2, union size = 5 → J =
2/5 = 0.4.
Formula:
1
AA(u, v) = ∑
log ∣N (w)∣
Formula:
Example: if |N(u)|=4 and |N(v)|=6 → PA = 24. High PA suggests new link likely because nodes with
many neighbors have higher chance to connect.
N −1
Cclo (v) =
∑u=v
d(v, u)
1
where d(v, u) is shortest-path distance, N nodes. Some texts use the reciprocal ; the
∑ d(v, u)
(b) Distance matrix (5 marks) — using the same assumed network from Q8.
Recall edges assumption (restated briefly) and degrees. Now compute pairwise shortest path distances
(I'll compute carefully). Nodes ordered [A,B,C,D,E,F,G,H,I].
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 42/46
A: B,C,I
B: A,C,D
C: A,B,D,E
D: B,C,E,G
E: C,D,F
F: E,G
G: D,F,H,I
H: G,I
I: A,G,H
Now compute distances d(i,j) (I will produce the symmetric matrix). Use manual BFS from each node.
From A:
d(A,A)=0
neighbors: B(1), C(1), I(1)
From B: D reachable in 2 (A→B→D)
From C: E reachable in 2 (A→C→E)
From B:
From D: G at 2
From A: I at 2
H via G at 3, F via E at 3
So row B: [1,0,1,1,2,3,2,3,2]
From C:
neighbors A(1),B(1),D(1),E(1)
I via A at 2, G via D at 2, F via E at 2, H via G at 3
Row C: [1,1,0,1,1,2,2,3,2]
From D:
From E:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 43/46
neighbors C(1),D(1),F(1)
B via C at 2, A via C at 2, G via D or F at 2, I via G at 3, H via G at 3
Row E: [2,2,1,1,0,1,2,3,3]
From F:
From G:
neighbors D(1),F(1),H(1),I(1)
C via D at 2, E via D or F at 2, A via I at 2, B via D at 2
Row G: [2,2,2,1,2,1,0,1,1]
From H:
neighbors G(1),I(1)
D via G at 2, F via G at 2, A via I at 2, C via D at 3, B via D at 3, E via D→E at 3
Row H: [2,3,3,2,3,2,1,0,1]
From I:
neighbors A(1),G(1),H(1)
from A: B,C at 2; from G: D,F at 2; E via G→D→E maybe 3
Row I: [1,2,2,2,3,2,1,1,0]
css
A B C D E F G H I
A [ 0 1 1 2 2 3 2 2 1 ]
B [ 1 0 1 1 2 3 2 3 2 ]
C [ 1 1 0 1 1 2 2 3 2 ]
D [ 2 1 1 0 1 2 1 2 2 ]
E [ 2 2 1 1 0 1 2 3 3 ]
F [ 3 3 2 2 1 0 1 2 2 ]
G [ 2 2 2 1 2 1 0 1 1 ]
H [ 2 3 3 2 3 2 1 0 1 ]
I [ 1 2 2 2 3 2 1 1 0 ]
sum_d(A) = 0+1+1+2+2+3+2+2+1 = 14
sum_d(B) = 1+0+1+1+2+3+2+3+2 = 15
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 44/46
sum_d(C) = 1+1+0+1+1+2+2+3+2 = 13
sum_d(D) = 2+1+1+0+1+2+1+2+2 = 12
sum_d(E) = 2+2+1+1+0+1+2+3+3 = 15
sum_d(F) = 3+3+2+2+1+0+1+2+2 = 16
sum_d(G) = 2+2+2+1+2+1+0+1+1 = 12
sum_d(H) = 2+3+3+2+3+2+1+0+1 = 17
sum_d(I) = 1+2+2+2+3+2+1+1+0 = 14
Interpretation: Nodes D and G have highest closeness (0.6667), meaning they are centrally located in
terms of distance to others.
Transitivity (recomputed)
For small graph A–B–C–D (chain of 4), betweenness of B = number of shortest paths between other pairs
going through B. Pairs: (A,C): path A–B–C uses B (1/1), (A,D): A–B–C–D uses B (1/1), (C,D): C–D (does not
(N −1)(N −2)
use B). Sum =2. Normalize by 2
= 3 → 2/3.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 45/46
Independent Cascade expected spread example
Seed A in chain A–B–C with p=0.5: expected cascade size = 1 + 0.5 + 0.25 = 1.75 (method: linearity of
expectation).
(A) Convert these into typed, printable answers formatted for your submission (PDF/Word), or
(B) Make a clean answer-key with just the final numeric results and short lines (one page), or
(C) Recompute any of the numericals after you confirm/adjust specific edges from the diagrams?
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 46/46