0% found this document useful (0 votes)
146 views46 pages

Social Network Analysis Exam Guide

Uploaded by

reyaziqbal4121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views46 pages

Social Network Analysis Exam Guide

Uploaded by

reyaziqbal4121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Social Network Analysis topics

Explain the topics " Introduction to Social Web, Nodes, Edges and Network Measures, Describing
Nodes and Edges,
Describing Networks, Layouts, Visualizing network features" with respect to Social Network
Analysis for semester college exams. Explain broadly and theoretically covering each topic and
avoid any kind of filler content

Below is a broad, complete, theory-rich explanation of all the topics for your Social Network Analysis
(SNA) exam. Each topic is written in a clean, academic, semester-exam style, without filler content, and
can be directly used for 8–10 mark long answers.

SOCIAL NETWORK ANALYSIS – EXAM NOTES

1. Introduction to Social Web


The Social Web refers to the online ecosystem where individuals, groups, organizations, and systems
interact, share information, and form digital relationships. It includes platforms like social networking
sites (Facebook, LinkedIn), microblogging sites (Twitter/X), collaboration platforms (GitHub), discussion
forums (Reddit), media-sharing sites (YouTube, Instagram) and messaging applications.
In SNA, the Social Web is conceptualized as a network of entities connected by various types of
relationships.
Key characteristics of the Social Web:

User-generated content: Posts, comments, likes, shares.


Interactive communication: Messaging, tagging, commenting.
Community formation: Groups, clusters, sub-networks.

Traceable relationships: Digital footprints allow analysis of influence and behavior.


Dynamic nature: Networks evolve continuously based on interactions.

The Social Web provides rich data for studying:

Information diffusion
Influence and opinion formation

Community structure
Recommendation systems

Behavioral patterns

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/46
Thus, SNA helps to mathematically model and analyze these interactions using graph theory and
statistical methods.

2. Nodes, Edges and Network Measures


Nodes (Vertices)

Nodes represent the entities or actors in a network.


Examples:

Individuals on Facebook
Accounts on Twitter

Webpages on the Internet


Proteins in biological networks

Each node may have attributes like name, age, location, follower count, etc.

Edges (Links or Relationships)

Edges represent the relationship, interaction, or connection between nodes.


Edges can be:

Directed (A → B, e.g., "A follows B")


Undirected (A — B, e.g., "A and B are mutual friends")

Weighted (strength of connection, e.g., number of messages exchanged)

Unweighted (simple presence of a link)

Edges describe how information or influence flows through a network.

Network Measures

These are quantitative metrics used to mathematically analyze networks.

Major categories of network measures:

1. Degree-based measures

Degree: Number of edges connected to a node.

In-degree (for incoming edges)

Out-degree (for outgoing edges)


2. Centrality measures

Betweenness centrality: Measures how often a node lies on shortest paths.

Closeness centrality: Inverse of total distance to all other nodes.


Eigenvector centrality: Influence of a node based on connected influential neighbors.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/46
PageRank: Variant of eigenvector centrality used by Google.
3. Network-level measures

Density: Ratio of existing edges to total possible edges.

Diameter: Longest shortest path between any two nodes.


Average path length: Typical distance between nodes.

4. Clustering coefficient

Measures how well neighbors of a node are interconnected.

Network measures help in identifying important nodes, community structures, efficiency of information
spread, and the robustness of networks.

3. Describing Nodes and Edges


Describing Nodes

Nodes are described using:

Structural properties

Degree

Centrality scores
Connectivity

Node position in network

Attribute properties

Demographics: age, gender, country


Behavioral: posting frequency, engagement

Functional: role, type, category

Nodes may also be classified based on roles:

Hubs (high-degree nodes)

Bridges (nodes connecting different communities)


Influencers (high centrality nodes)

Describing nodes is essential to identify key actors, leadership roles, vulnerable nodes, and
communities.

Describing Edges

Edges can be characterized along the following dimensions:

1. Direction
Directed (follows, subscribes, cites)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/46
Undirected (friendship, collaboration)
2. Weight

Weighted edges represent intensity (e.g., number of retweets, messages).


3. Multiplicity
Single edge or multiple edges between nodes (multigraphs).

4. Temporal nature
Static (fixed relationship)
Dynamic (time-evolving interactions)

5. Type of relationship
Social relationship (friendship)

Communication (email)
Collaboration (co-authorship)
Interaction (likes, comments)

Describing edges helps understand how strong or weak ties shape network behavior and diffusion
processes.

4. Describing Networks
A network can be described using its topology, size, connections, and structural properties.

Key network descriptions

1. Size
Number of nodes (N)
Number of edges (E)

2. Density
Measure of how many edges exist relative to maximum possible edges.
3. Network Type

Directed / Undirected
Weighted / Unweighted

Bipartite (two types of nodes, e.g., users and products)


Multilayer (multiple interaction types)
4. Degree Distribution

Shows how node degrees are spread across the network.


Many social networks follow a power-law distribution (scale-free networks).
5. Community Structure

Networks often have tightly connected clusters.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/46
Communities indicate shared interests, groups, or functional units.
6. Diameter and Path Length

Small-world effect in social networks indicates short path lengths between any nodes.
7. Components
Connected components or isolated sub-networks.

Through these properties, networks can be compared, classified, and analyzed for patterns.

5. Layouts
Layouts refer to the visual arrangement of nodes and edges when representing a network graphically.
Layouts help in visually identifying:

Clusters

Hubs and influencers


Bridges between communities
Isolated nodes

Types of Layouts

1. Force-directed layouts
Nodes repel each other, edges act like springs.

Examples: Fruchterman–Reingold, ForceAtlas2


Most intuitive for social networks.
2. Circular layout

Nodes arranged in a circle.


Useful for bipartite or hierarchical structure visualization.

3. Hierarchical or Tree layout


Nodes placed in layers based on hierarchy.
Used for organizational or dependency networks.

4. Random layout
Nodes placed randomly, mainly for debugging or comparison.
5. Grid layout

Nodes arranged in rows/columns for simplicity.

Well-designed layouts improve readability and reveal structural patterns effectively.

6. Visualizing Network Features

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/46
Network visualization is the process of representing nodes, edges, and their attributes through
graphical interfaces to interpret structure and patterns.
Visualization helps identify:

Central nodes
Community clusters
Edge density

Interaction patterns
Structural holes
Strong vs. weak ties

Network growth over time

Visualization Techniques

1. Node Color
Represents attributes like community, category, or influence.
2. Node Size

Indicates degree, centrality, or activity level.


3. Edge Thickness

Shows weight (strength of connection).

4. Edge Color

Represents type of interaction or relationship.


5. Heatmaps

Visualize adjacency matrices.

6. Temporal visualization
Shows network evolution over time using animations or time-sliced graphs.

7. Clustering and community visualization

Highlight groups using modularity-based layouts or color coding.

Visualization tools commonly used:

Gephi

Cytoscape

NetworkX (Python)
Pajek

NodeXL

Effective visualization makes large, complex networks easier to interpret for decision-making and
research.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/46
If you want, I can also prepare:
✅ 10-mark long answers for each topic
✅ 5-mark short answers
✅ Condensed 1-page exam notes
✅ Diagrams or labeled network examples
Just tell me what format you need.

Explain the topics "The role of Tie strength, Measuring


Tie strength and its network structures, network propagation, Link prediction, entity resolution,
Case study" with respect to Social Network Analysis for semester college exams. Explain broadly
and theoretically covering each topic and avoid any kind of filler content

Below is a broad, theory-focused, exam-ready explanation of all topics from Social Network Analysis
(SNA). Everything is written in a clean academic style for 8–10 mark long answers with no filler content.

SOCIAL NETWORK ANALYSIS — EXAM NOTES

1. The Role of Tie Strength


In Social Network Analysis, a tie refers to the relationship or connection between two nodes. Tie
strength represents the intensity, closeness, and frequency of interactions between individuals.

Importance of Tie Strength in SNA

1. Information Flow:
Strong ties (close friends, family) enable high-trust, rich information exchange.
Weak ties (acquaintances) enable access to new, diverse information from different social circles.

2. Network Cohesion:
Strong ties create tightly knit clusters, while weak ties bridge different clusters and reduce
fragmentation.

3. Diffusion and Influence:


Weak ties often play a crucial role in spreading new ideas more widely (Granovetter’s “Strength of
Weak Ties” theory).
Strong ties amplify influence within local communities.

4. Community Formation:
Strong ties form dense social groups; weak ties connect these groups to broader networks.
5. Structural Holes and Social Capital:
Individuals connected by weak ties can bridge structural holes, gaining strategic advantages such as
access to new opportunities.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/46
Thus, tie strength shapes network topology, behavior, communication patterns, and diffusion processes.

2. Measuring Tie Strength and Its Network Structures


A. Measuring Tie Strength

Tie strength is measured using a combination of behavioral, structural, and interaction-based


features. Common metrics include:

1. Frequency of Interaction:
Number of messages, calls, comments, or shared activities.

2. Duration of Relationship:
Length of time individuals have known each other.

3. Reciprocity:
Balanced communication (mutual commenting, liking, replying) indicates stronger ties.
4. Emotional Intensity:
Often inferred from content sentiment, message length, or degree of personal communication.
5. Intimacy / Trust:
Closeness or personal nature of communication.

6. Structural Features:
Number of mutual friends

Overlap of neighborhoods

Co-participation in events
Similarity of interests

Tie strength can be quantified as a weighted sum of multiple features or through supervised learning
models.

B. Network Structures Based on Tie Strength

Tie strength influences how a network is structured. Key structures include:

1. Strong-Tie Subnetworks:
High clustering coefficient

Form dense, cohesive groups

Support stable, trustworthy communication

2. Weak-Tie Bridges:
Connect otherwise disconnected clusters

Reduce network diameter

Enable fast spread of new information


3. Multiplex Ties:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/46
Multiple types of relationships between the same individuals (e.g., colleagues + friends)

Usually strong ties with high bandwidth


4. Brokerage Structures:

Nodes with many weak ties bridging different communities

High betweenness centrality


5. Core–Periphery Structures:

Strong ties in the core

Weak ties towards the periphery


Common in organizational and social systems

These structures help identify influencers, community boundaries, and communication bottlenecks.

3. Network Propagation
Network propagation refers to the process through which information, behaviors, opinions, or
innovations spread across a network.

Modes of Propagation

1. Simple Contagion:
Information spreads through single contact events (e.g., viral news).

2. Complex Contagion:
Adoption requires reinforcement from multiple neighbors (e.g., joining a movement).
3. Threshold Models:
A node adopts a behavior only if the proportion of active neighbors exceeds a threshold.

4. Epidemic Models:
Based on disease spread models like SIR and SIS.

Factors Affecting Propagation

Tie strength: strong ties increase reinforcement; weak ties increase reach.

Node centrality: influential nodes accelerate spread.


Network density: dense networks facilitate faster propagation.

Community structure: boundaries slow cross-community spread.

Propagation analysis is used in social media marketing, misinformation tracking, epidemiology, and viral
content prediction.

4. Link Prediction
Link prediction aims to predict missing, future, or potential edges in a network based on existing
structure and patterns.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/46
Applications

Recommendation systems (friend suggestions, product connections)

Fraud detection
Knowledge graph completion

Social media growth modeling

Techniques for Link Prediction

A. Similarity-Based Methods

1. Common Neighbors:
More mutual friends → higher likelihood of connection.

2. Jaccard Coefficient:
Ratio of shared neighbors to total neighbors.
3. Adamic–Adar Index:
Weighs shared neighbors based on rarity.
4. Preferential Attachment:
Nodes with high degree tend to acquire more links.

B. Path-Based Methods

Katz index, Shortest path scores, and Random walk algorithms measure closeness via paths.

C. Probabilistic and Machine Learning Methods

Logistic regression
Graph embedding models

Graph Neural Networks (GNNs)

Link prediction helps identify evolving relationships and potential structural changes in future networks.

5. Entity Resolution
Entity Resolution (ER) refers to the process of identifying when multiple references, records, or
profiles refer to the same real-world entity, especially across noisy or incomplete social data.

Importance in SNA

Users may have multiple accounts or inconsistent profiles.


Data collected from multiple platforms (Twitter, Facebook, LinkedIn) must be unified.

Duplicate entities distort network metrics, centrality, and community detection.

Steps in Entity Resolution

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/46
1. Data Matching:
Compare attributes like name, username, email, phone number.
2. Similarity Computation:
Text similarity, structural similarity, profile overlaps.

3. Contextual Matching:
Common friends, interaction patterns, geolocation.

4. Classification or Clustering:
Machine learning models classify whether two records are the same.

5. Merging:
Resolve duplicates and create a clean unified node.

Challenges

Missing or inconsistent data

Fake or ambiguous profiles

Privacy restrictions

ER is essential for constructing accurate social graphs.

6. Case Study (General Structure for Exam)


You may be asked to illustrate a case study on any SNA application. Below is a generic, academically
acceptable case study that fits most university exams.

Case Study: Information Diffusion in a Twitter Social Network


Objective

To analyze how a news topic spreads in a social media network using SNA techniques.

Dataset

Twitter user graph


Tweets and retweets related to a specific event

Metadata: timestamps, user ID, follower connections

Methodology

1. Network Construction:
Nodes represent users; edges represent follower/followee relations.

2. Tie Strength Measurement:


Weighted edges based on frequency of mentions, replies, and retweets.
3. Network Structure Analysis:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/46
Degree centrality to identify key influencers
Clustering coefficient to detect communities
Weak ties identified as bridges between clusters
4. Network Propagation Study:
Retweet cascades analyzed using temporal data.
Propagation paths mapped to find influential spreaders.
5. Link Prediction:
Future follower relations predicted using common neighbors and preferential attachment.
6. Entity Resolution:
Duplicate user profiles or bots identified and removed using similarity-based matching.

Findings

Weak ties facilitated cross-community diffusion.

Few high-centrality users triggered large cascades.


Link prediction revealed potential new connections forming around trending topics.

Conclusion

SNA helps understand how information moves, who controls influence, and how network structure
shapes communication dynamics.

If you want, I can also provide:


✅ 10-mark and 5-mark answers for each topic
✅ Condensed 1-page notes for quick revision
✅ Diagram-based explanation for link prediction, propagation, or tie strength
Just tell me what you need next.

Explain the topics " Introduction to community discovery, communities in context, quality
functions, The
Kernighan-Lin algorithm, Agglomerative algorithms, spectral algorithms, multi-level graph
partitioning, Markov clustering, Other approaches" with respect to Social Network Analysis for
semester college exams. Explain broadly and theoretically covering each topic and avoid any kind
of filler content

Below is a broad, theory-rich, exam-focused explanation of all topics from Community Discovery in
Social Network Analysis. The content is written in a clean academic style suitable for 8–10 mark long
answers, with no filler content.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 12/46
SOCIAL NETWORK ANALYSIS — COMMUNITY DISCOVERY

1. Introduction to Community Discovery


Community Discovery refers to the process of identifying groups of nodes in a network that are more
densely connected internally than with the rest of the network. These groups are known as
communities, clusters, or modules.

Why Community Discovery is Important

Reveals functional groups (e.g., friend circles, interest groups).

Helps understand information diffusion patterns.


Identifies key influencers and cohesive subgroups.
Improves recommendation systems and link prediction.

Essential for biological networks (protein complexes), web graphs, and social media analytics.

Community discovery is central to SNA because most real-world social networks exhibit clustered,
modular, and hierarchical structures rather than random patterns.

2. Communities in Context
Communities are interpreted based on their network context, i.e., the nature of the graph and the type
of relationships it represents.

Types of Communities

1. Disjoint Communities:
Each node belongs to only one community. Common in traditional clustering.
2. Overlapping Communities:
Nodes may belong to multiple communities (e.g., a person in different social circles).
3. Hierarchical Communities:
Communities contain sub-communities at multiple levels (tree-like structure).

4. Dynamic Communities:
Communities evolve over time in temporal networks.

Contexts Where Communities Matter

Social networks: Identify friend groups, interest communities.


Biological networks: Detect functional protein modules.

Information networks: Discover related webpages (web communities).


Communication networks: Identify tightly interacting departments/teams.
Marketing: Target user segments and analyze influence groups.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 13/46
Understanding context ensures that detected communities align with meaningful real-world structures.

3. Quality Functions
Quality functions are mathematical metrics that evaluate the “goodness” of the communities discovered.
They help compare and validate different partitions of a network.

Major Quality Functions

A. Modularity (Q)

The most widely used measure.


Compares actual number of intra-community edges to expected number in a random graph.
Higher modularity → stronger community structure.

B. Conductance

Measures the ratio of edges leaving a community to total edges within it.
Lower conductance → better-defined community.

C. Cut Ratio

Ratio of edges crossing between two sets of nodes relative to possible edges.

D. Density

Measures how tightly connected nodes inside a community are.

E. Edge Clustering Coefficient

Indicates how strongly edges participate in triangles inside a community.

Quality functions guide algorithms to ensure meaningful communities with high internal connectivity
and low external interaction.

4. The Kernighan–Lin Algorithm


The Kernighan–Lin (KL) algorithm is a classic graph partitioning method designed to divide a network
into two balanced communities while minimizing the number of edges between them.

Key Features

Initially partitions nodes into two sets.

Iteratively swaps node pairs across partitions if doing so reduces edge-crossing cost.
Continues until no improvement is possible.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 14/46
Algorithm Steps

1. Create an initial partition into two equal-sized sets.

2. Compute gain values representing improvement from swapping nodes.


3. Exchange node pairs that maximize gain.
4. Freeze swapped nodes until iteration ends.

5. Choose the best configuration found during the iteration.


6. Repeat until stable.

Characteristics

Deterministic improvement mechanism.


Efficient for small and medium-sized networks.

Forms the basis for many modern graph partitioning algorithms.

KL is used as a foundational method in community detection and graph partitioning problems.

5. Agglomerative Algorithms
Agglomerative algorithms follow a bottom-up hierarchical clustering approach.

Approach

Start with each node as its own community.


Iteratively merge the two most similar communities.

Continue merging until a stopping criterion is reached (e.g., modularity peak).

Types of Agglomerative Methods

1. Single-link clustering:
Merges communities with the smallest edge distance.
2. Complete-link clustering:
Ensures that merged communities are tightly connected.
3. Average-link clustering:
Considers average similarity between communities.
4. Modularity-based agglomeration (e.g., Louvain first phase):
Iteratively combines communities that increase modularity.

Advantages

Produces a hierarchical structure (dendrogram).


Suitable for networks with unknown number of communities.

Agglomerative algorithms are simple but can be computationally expensive for very large networks.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 15/46
6. Spectral Algorithms
Spectral algorithms use eigenvalues and eigenvectors of matrices associated with graphs (such as the
Laplacian matrix) to detect community structure.

Working Principle

Represent the graph using the Laplacian matrix.


Compute eigenvectors corresponding to smallest non-zero eigenvalues.
Use these eigenvectors to embed nodes in a lower-dimensional space.

Apply clustering (e.g., k-means) to obtain communities.

Key Insight

The graph’s spectrum encodes connectivity patterns.


Nodes placed close together in eigenvector space belong to the same community.

Applications

Detecting clusters in large sparse networks.


Partitioning irregularly shaped communities.

Spectral methods are mathematically elegant and effective but require eigen decomposition, which may
be costly for huge networks.

7. Multi-Level Graph Partitioning


Multi-level partitioning is a scalable approach suitable for large graphs.

Three-Phase Process

1. Coarsening Phase:
Gradually reduce the graph size by merging nodes/edges.

Create a set of progressively smaller graphs.


2. Initial Partitioning Phase:
Partition the smallest (coarsest) graph.

3. Uncoarsening and Refinement Phase:


Project partitions back to original graph.
Refine using algorithms like Kernighan–Lin or Fiduccia–Mattheyses.

Advantages

High efficiency and scalability.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 16/46
Produces globally good partitions.
Examples include Metis, Louvain, and Leiden methods.

This method is widely used in real-life applications that require fast community detection on massive
networks.

8. Markov Clustering (MCL)


Markov Clustering is an unsupervised algorithm based on flow simulation on graphs using random
walks.

Principle

Communities have dense internal flow but sparse external flow.

Random walks tend to get "trapped" in dense clusters.

Core Operations

1. Expansion:
Simulates random walks by taking higher powers of the adjacency matrix.
2. Inflation:
Strengthens strong connections and weakens weak ones by applying a power-raising and
normalization process.

These steps are repeated until convergence.

Advantages

Effectively identifies natural cluster boundaries.


No need to pre-specify number of communities.

Well-suited for biological networks and web graphs.

MCL is considered robust and easy to control via the inflation parameter, which influences granularity.

9. Other Approaches
Various advanced or alternative community detection techniques include:

A. Label Propagation

Nodes adopt the labels of their majority neighbors.

Repeats until convergence.


Extremely fast, suitable for large graphs.

B. Clique Percolation Method

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 17/46
Detects overlapping communities using k-cliques.
A community is formed by adjacent cliques sharing k–1 nodes.

C. Stochastic Block Models

Probabilistic modeling of groups based on connection likelihoods.


Used for inference-based community detection.

D. Edge Betweenness Algorithms (e.g., Girvan–Newman)

Remove edges with highest betweenness iteratively.

Communities emerge progressively.

E. Louvain and Leiden Algorithms

Multi-level modularity optimization.


Highly scalable and widely used in social and biological networks.

F. Dynamic Community Detection

Tracks changes in communities over time in temporal networks.

G. Overlapping Community Detection

Soft clustering that allows nodes to belong to multiple communities.

These approaches extend the flexibility of community discovery to fit various network shapes, sizes, and
application domains.

If you want, I can also prepare:


✅ 10-mark answers for each individual topic
✅ 5-mark short notes
✅ Condensed revision notes (1–2 pages)
Just tell me your preferred format.

Explain the topics "Introduction to social influence, Influence


related statistics, social similarity and influence, Homophily, Existential Test for social influence,
Influence and actions, Influence and interactions, influence maximization in viral marketing" with
respect to Social Network Analysis for semester college exams. Explain broadly and theoretically
covering each topic and avoid any kind of filler content

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 18/46
Below is a broad, theory-rich, exam-oriented explanation of all topics from Social Influence in Social
Network Analysis (SNA). The content is written in a clean academic style suitable for 8–10 mark long
answers, with no filler content.

🔵 SOCIAL NETWORK ANALYSIS — SOCIAL INFLUENCE (Exam


Notes)

1. Introduction to Social Influence


Social influence refers to the process by which individuals' thoughts, behaviors, or decisions are shaped
by other people in the network. In SNA, influence is studied as a network-driven phenomenon where a
node’s behavior is impacted by its neighbors due to interactions, relationships, communication patterns,
or social pressure.

Key Characteristics

Influence is transmitted through edges representing interactions (communication, friendship,


following, etc.).
It affects adoption of behaviors, information diffusion, opinions, purchases, and social norms.
Influence depends on:

Node position (centrality, degree)


Tie strength
Network structure

Social similarity

SNA treats influence as a quantifiable and measurable process, important for applications like viral
marketing, opinion mining, and behavioral prediction.

2. Influence-Related Statistics
To study social influence rigorously, various statistical measures and models are used. These help
quantify how much influence one node may exert on another.

Main Influence Statistics

A. Correlation of Actions

Measures similarity in behavior between connected nodes (e.g., number of times they adopt the same
action).

B. Conditional Probabilities

Probability that a node performs an action given that its neighbor performed it earlier.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 19/46
C. Influence Probability

Defines the likelihood that an action spreads from node A to node B.

D. Exposure Count

Number of neighbors who have already adopted a behavior before a node adopts it.

E. Temporal Influence Statistics

Measures the time-lag between a neighbor performing an action and the node following it.

These statistics are used in influence models like Independent Cascade (IC) and Linear Threshold (LT).

3. Social Similarity and Influence


Social similarity refers to the degree to which individuals are alike in characteristics such as interests,
background, preferences, or behavior.

Types of Similarity

Structural similarity: Common friends, shared interactions.


Attribute similarity: Age, profession, geography.

Behavioral similarity: Similar choices, consumption patterns.

Relationship with Influence

Similar individuals are more likely to influence each other.


Frequent interactions increase behavioral alignment.
Social clusters with high similarity enable faster diffusion.

Similarity vs. Influence

Similarity may be a precondition for influence.

Influence strengthens similarity over time through repeated adoption.

Understanding similarity is essential to distinguish true influence from coincidental behavior.

4. Homophily
Homophily is the tendency of individuals to associate and bond with others who are similar to
themselves.

Types of Homophily

1. Status Homophily:
Based on sociodemographic attributes (age, gender, education, location).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 20/46
2. Value Homophily:
Based on shared beliefs, interests, values.

Effects of Homophily on Networks

Creates clusters of similar individuals.


Reinforces social similarity and strengthens group identity.

Affects information and behavior diffusion within clusters.


Can limit exposure to diverse opinions (echo chambers).

Homophily complicates influence analysis because observed behavior similarity may arise from
similarity rather than influence.

5. Existential Test for Social Influence


This test examines whether a behavior’s spread in a network can be attributed to influence rather
than mere similarity or chance.

Purpose

To determine whether social influence is actually present or whether the observed behavior is due to:

Homophily
Independent decisions
External factors

Methods for Testing Influence

1. Shuffle Tests / Randomization

Randomly shuffle behavior timestamps.


If influence disappears after shuffling, real influence exists.
2. Matched Pair Analysis

Compare influence among similar and dissimilar pairs.


If behavior alignment is higher in connected pairs, influence exists.
3. Causal Inference Models

Check whether neighbor behavior precedes node behavior statistically.


4. Temporal Ordering Tests
Ensure that influence flows from earlier adopters to later adopters.

Outcome

Confirms that influence is not due to accidental timing or homophily alone.

Provides statistical evidence for causal influence.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 21/46
6. Influence and Actions
This topic studies how social influence shapes individual actions, such as:

Joining a group

Purchasing a product
Sharing or liking content
Participating in events

Adopting new behavior or technology

Key Concepts

1. Behavioral Adoption

A node adopts an action because its neighbors did so earlier.


2. Action Cascades

Chains of influence-triggered actions across the network.


3. Threshold Behavior
A node acts only after enough neighbors act (threshold models).

4. Exposure Effect
More exposures to an action → higher likelihood of adoption.

Actions are analyzed based on temporal sequences, exposure counts, and diffusion patterns.

7. Influence and Interactions


Interactions such as communication, commenting, messaging, tagging, or mentioning play a crucial role
in transmitting influence.

Types of Interactions Relevant to Influence

1. Direct interactions:
Messages, replies, comments.
2. Indirect interactions:
Likes, views, retweets.
3. Repeated interactions:
Increase trust and tie strength.
4. Reciprocal interactions:
Two-way interactions indicate stronger influence potential.

Effects on Influence

More interactions → stronger influence probability.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 22/46
Interaction frequency predicts likelihood of action adoption.
Network propagation depends heavily on active interaction links.

Influence is not solely based on network structure; interaction dynamics determine real diffusion.

8. Influence Maximization in Viral Marketing


Influence maximization aims to identify a small set of initial individuals (seed nodes) who can trigger
the largest possible cascade of actions through social influence.

Importance

Used in advertising, product launches, political campaigns, content promotion.

Problem Definition

“Given a network, find k seed nodes that maximize expected spread of influence.”

Models Used

1. Independent Cascade Model (IC)


Each activated node attempts to activate neighbors with certain probability.
2. Linear Threshold Model (LT)

A node activates if cumulative influence from its neighbors exceeds a threshold.

Strategy to Select Seed Nodes

Choose nodes with:


High degree (degree centrality)
High betweenness centrality (bridges)

High PageRank or eigenvector centrality


Strategic placement across communities

Greedy Algorithms (e.g., Kempe–Kleinberg–Tardos)

Estimate marginal influence spread for each candidate seed.


Iteratively select nodes giving the largest gain.

Heuristics

Rank nodes by centrality.


Select one influencer per community for maximum coverage.

Outcome

Achieve viral spread at minimal cost.


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 23/46
Optimize marketing campaigns in large networks.

⭐ If you want next:


✅ 10-mark answers
✅ 5-mark short notes
✅ One-page condensed revision sheet
Just tell me!

Give me all numericals related to this subject, with formulas and example

Great — below are concise, exam-ready numericals covering the core quantitative computations you’ll
encounter in Social Network Analysis. For each item I give the formula, a small example network /
data, and the step-by-step worked solution so you can reproduce it in an exam. I focused on the most
important measures and algorithms students are usually tested on.

1. Degree, In-degree, Out-degree


Formula (unweighted graph):
Degree of node v : deg(v) = number of incident edges.
in
Directed: deg (v), degout (v).
Example: Graph with nodes A, B, C, D and edges: A−B, A−C, B −C, C −D.
Solution:
deg(A) = 2 (edges AB, AC).
deg(B) = 2 (AB, BC).
deg(C) = 3 (AC, BC, CD).
deg(D) = 1 (CD).

2. Graph Density
2E
Formula (undirected): Density = where E edges, N nodes.
N (N − 1)

= 4, edges E = 4.
Example: Same graph above: N
2×4 8
Solution: Density = = = 0.6667.
4×3 12
​ ​

3. Clustering Coefficient (Local & Global)


Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 24/46
Local clustering of node v :
2T (v)
C(v) = where T (v) = number of triangles through v , kv = deg(v).
kv (kv − 1)
​ ​

​ ​

3 × number of triangles
Global (average) clustering: average of local C(v) or transitivity: .
number of connected triples

Example: Use previous graph. Triangles: only one triangle A-B-C → 1 triangle.
2×1 2
Local C(C): kC = 3. Triangles through C =1. So C(C) =
= = 0.3333.
3×2 6
​ ​ ​

2×1
Local C(A): kA = 2. Triangles through A =1 (A-B-C). So C(A) = = 1.0.
2×1
​ ​

Local C(B) = 1.0. Local C(D): kD = 1 → undefined (take 0). ​

Average clustering = (1 + 1 + 0.3333 + 0)/4 = 2.3333/4 = 0.5833.

4. Shortest Path, Average Path Length, Diameter


Shortest path d(u, v): length of shortest path.
1
Avg path length: N ​
∑u<v d(u, v).

( ) 2

Diameter: maxu,v d(u, v). ​

Example: Same graph. Distances:

d(A,B)=1, d(A,C)=1, d(A,D)=2 (A-C-D)

d(B,C)=1, d(B,D)=2 (B-C-D)


d(C,D)=1

= 8. Number of pairs (42) = 6.


Sum distances over pairs: 1 + 1 + 2 + 1 + 2 + 1 ​

Avg path length = 8/6 = 1.3333. Diameter = max{2} = 2.

5. Betweenness Centrality (Freeman)


Formula (normalized):
σst (v)
CB(v) = ∑s=v=t where σst total shortest paths s→t, σst (v) those passing through v.

  ​ ​ ​ ​

σst ​

Example: Same graph. Consider node C . Pairs (A,D): shortest paths A–C–D (unique) → counts for C.
(B,D): B–C–D (unique). (A,B): path A–B (doesn't include C). (A,C),(B,C) trivial. So σAD (C) = 1, σBD (C) = ​ ​

1. Total possible ordered unordered pairs excluding v: (32) = 3 pairs among A,B,D: (A,B),(A,D),(B,D). Only

(N −1)(N −2)
two use C. So unnormalized = 2. Normalized (for undirected graphs) often divide by 2 = ​

3×2
2
​ = 3: CB(C) = 2/3 = 0.6667.

6. Closeness Centrality

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API.


N 25/46
N −1
Formula (normalized): Cclo (v) = .
∑u=v
​ ​

 d(u, v)

Example: Node C distances to others: d(C,A)=1, d(C,B)=1, d(C,D)=1 → sum = 3; N = 4.


3
Cclo (C) = = 1.0. For node A: distances to B (1), C(1), D(2) → sum=4 → Cclo (A) = 3/4 = 0.75.
3
​ ​ ​

7. Eigenvector Centrality (simple iterative)


1
Definition: xv = ∑ Avu xu — dominant eigenvector of adjacency matrix.
λ u
​ ​ ​ ​ ​

Example: Small 3-node chain: nodes 1–2–3 (A = [[0,1,0],[1,0,1],[0,1,0]]).


Symmetry gives central node 2 highest. Solve eigenvector for largest eigenvalue λ = 2 (for chain of 3,

eigenvalues { 2, 0, − 2}). Normalized eigenvector corresponding to


​ ​ 2 ≈ [0.5, 0.7071, 0.5]. So

eigenvector centralities: node2 ≈0.7071, nodes1&3≈0.5.

(You may be asked to compute power iteration once: start x=[1,1,1], multiply A, normalize → shows
convergence.)

8. PageRank (one iteration example)


Formula (damping d, N nodes):
1−d PR(u)
PR(v) = +d ∑ out .
deg (u)
​ ​ ​

N
u∈Γ (v) in ​

Example: 3-node directed cycle A→B→C→A. Set d = 0.85. Start with uniform PR=1/3 ≈0.3333. One
iteration:
For A: PR(A) = 0.05 + 0.85*(PR(C)/outdeg(C)). Outdeg(C)=1, PR(C)=0.3333 →
PR(A)=0.05+0.85*0.3333=0.05+0.2833=0.3333 (stays same). So steady state uniform PR=1/3.
If graph had a sink, iterative steps would change values; exam questions often ask one/two iterations—
show calculations like above.

9. Jaccard Coefficient (Link Prediction / Similarity)


∣N (u) ∩ N (v)∣
Formula: J(u, v) = where N (x) neighbor set.
∣N (u) ∪ N (v)∣

Example: Nodes P and Q. N (P ) = {A, B, C}, N (Q) = {B, C, D, E}. Intersection {B, C} size 2.
Union {A, B, C, D, E} size 5.
J(P , Q) = 2/5 = 0.4.

10. Common Neighbors, Adamic–Adar, Preferential Attachment


Common Neighbors: ∣N (u) ∩ N (v)∣. Example above = 2.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API.


1 26/46
1
Adamic–Adar: AA(u, v) = ∑ .
log ∣N (w)∣
​ ​

w∈N (u)∩N (v)


If common neighbors = {B,C} with |N(B)|=5, |N(C)|=10 → AA = 1/ log 5 + 1/ log 10. Compute:
log natural? Usually base e. Using natural logs: ln5≈1.6094 → 1/1.6094=0.6213. ln10≈2.3026 →
1/2.3026=0.4343. Sum ≈1.0556.
Preferential Attachment: P A(u, v) = ∣N (u)∣ × ∣N (v)∣. Example: |N(P)|=3, |N(Q)|=4 → PA=12.

11. Katz Index (path-based)



Formula: Katz(u, v) = ∑ β l ⋅ (number of paths length l between u, v), where β small.

l=1

Example (3-node chain 1–2–3): Paths of length 1 between 1 and 2 =1; length 2 between 1 and 3 =1.
Compute Katz(1,3) with β = 0.1: only length-2 path exists → Katz(1,3)=β 2 ⋅ 1 = 0.01.

12. Link Prediction - Katz / Common example


Given graph: A—B—C; evaluate link score for A and C.
Common neighbors: 1 (B). Jaccard: 1/2 since N(A)={B}, N(C)={B} → union size 1 → wait union size 1 →
J=1/1=1. (Be careful: if only neighbor is B, both sets equal → J=1). Katz as above = 0.01 with β=0.1.

13. Modularity of a Partition


1 ki kj
Formula: Q = ∑ (Aij − )δ(ci , cj ) where m edges, ki degree, δ 1 if same community.
​ ​

2m 2m
​ ​ ​ ​ ​ ​ ​

ij

Example: Small graph: A-B-C triangle (3 nodes fully connected) plus isolated D connected to C only? Let's
take 4-node graph with edges: AB, AC, BC (triangle among A,B,C) and CD (edge C–D). So edges m=4.

Partition: community1 = {A,B,C}, community2 = {D}.


Compute contribution for community1: sum over i,j in {A,B,C} of (A_ij - k_i k_j /2m). Degrees: k_A=2
(edges AB,AC), k_B=2, k_C=3 (AC,BC,CD). 2m=8.
Compute sum of A_ij over pairs (unordered count each pair twice in sum i,j). But standard approach:
compute modularity per community = (internal edges fraction) - (sum degrees in community /2m)^2.
lc dc 2
Simpler formula: Q = ∑c ( −( ) ) where lc =internal edges count, dc =sum degrees of nodes in
​ ​

2m
​ ​ ​ ​ ​

m
community.
For community1: internal edges l1 = 3 (AB,BC,AC). d1 = k_A+k_B+k_C = 2+2+3=7. For community2: l2 =0,
d2 = k_D =1.
So Q = (3/4) − (7/8)2 + (0/4) − (1/8)2 . Compute: 3/4=0.75. (7/8)^2 = 49/64 = 0.765625. First term:
0.75 - 0.765625 = -0.015625. Second: 0 - (1/64)= -0.015625. Total Q = -0.015625 -0.015625 = −0.03125.
Negative modularity → this partition is worse than random baseline (expected because D ties to C).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 27/46
14. Conductance of a Community
cut(S, Sˉ)
Formula: ϕ(S) = where cut = number edges leaving S, vol = sum degrees in S.
min(vol(S), vol(Sˉ))

Example: Use community S={A,B,C} from previous example. cut = edges from S to outside = edge C–D
=1. vol(S) = d1 =7. vol(bar S)=d2=1. min =1. So ϕ(S) = 1/1 = 1.0. Very high conductance (bad
community separation).

15. Kernighan–Lin swap gain (small)


Idea: Gain for swapping nodes a∈A and b∈B is g(a, b) = D(a) + D(b) − 2wab where D(x) =

external cost − internal cost, and wab ​ = 1 if edge between a and b else 0.
Example: Partition sets A={a1,a2}, B={b1,b2} on a small graph. Suppose D(a1)=2, D(b1)=1, w_{a1b1}=0.
Then gain = 2+1-0 = 3. So swapping a1 and b1 improves cut by 3 (units).

(Exams typically ask to compute D values and gains once.)

16. Simple Spectral Bisection (Laplacian Fiedler vector idea)


Laplacian: L = D − A. Compute the second smallest eigenvector (Fiedler) and partition by sign.
Example: 3-node chain 1–2–3. Adjacency A as earlier. Degree matrix D = diag(1,2,1). Laplacian L =
1 −1 0
−1 2 −1 . Eigenvectors: second smallest eigenvector ≈ [0.5,0,-0.5] (nodes 1 and 3 opposite
​ ​ ​ ​ ​

0 −1 1
signs) → partition {1} and {3} with node2 ambiguous. This shows spectral split.

17. Markov Clustering (one expansion+inflation step)


Procedure: Use stochastic matrix M from adjacency, expand via squaring, inflate by raising elements to
power r then normalize columns.

Example: 3-node triangle A–B–C (fully connected). Adjacency with self-loops for MCL often: add self
loops, adjacency all ones (including diagonals) → adjacency matrix with 1s everywhere 3x3. Column
stochastic M = each column sums to 3 → each entry 1/3.
Expansion: M 2 = M because uniform. Inflation r=2: raise each element to power 2 → (1/3)^2=1/9.
Normalize columns: each column sum = 3*(1/9)=1/3 → normalized entry = (1/9)/(1/3)=1/3. So stable fixed
point; MCL finds single cluster for triangle.

18. Influence Models — Independent Cascade (IC): expected


spread on tiny graph
IC model: Each active node gets a single chance to activate each inactive neighbor with probability puv . ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 28/46
Example: Graph A—B—C (chain). Seed = {A}. Uniform edge activation probability p=0.5.
Find expected number of activated nodes (expected cascade size).
Stepwise:

A active initially (count 1).


A tries to activate B: success prob 0.5. If B activated, B then tries to activate C with prob 0.5.

Expected size = 1 (A) + Pr(B gets activated) * (1 + Pr(C activated | B activated))


= 1 + 0.5*(1 + 0.5) = 1 + 0.5*(1.5) = 1 + 0.75 = 1.75.
Breakdown: with prob 0.5, B activated; of those, with prob 0.5 C activated → so contributions: B
contributes 0.5, C contributes 0.25. Total 1 + 0.5 + 0.25 = 1.75.

19. Linear Threshold (LT) model — adoption decision


LT model: Each node has threshold θv ∈ [0, 1]. Each incoming neighbor u has weight wuv with
​ ​

∑u wuv
​ ​ ≤ 1. Node activates when sum of active neighbors’ weights ≥ θv . ​

Example: Triangle A,B,C. Weights uniform: each incoming weight = 0.5 (since two neighbors?). For node
C: weights from A and B 0.5 each. Threshold θC = 0.6. Seed set {A}. Will C activate?

Process: Initially A active. Sum of active neighbors for C = weight from A = 0.5 < 0.6, so C does not
activate. If later B activates, then C sees total 1.0 ≥0.6 and activates. So single seed A insufficient to
activate others unless threshold ≤0.5.

20. Influence Maximization — marginal gain example (greedy)


Small graph: A—B—C—D (chain). Uniform p=0.5, IC model. Choose k=1 seed to maximize expected
spread.
Compute expected spreads:

Seed A: expected size as in earlier chain length 4 starting at A:


A activates B with p=0.5 → if B active, B activates C p=0.5 etc. Expected size =1 + 0.5 + 0.25 + 0.125 =
1.875.
Seed B: expected = 1 + 0.5 (to A) + 0.5 (to C) + 0.25 (C→D if C active) = compute carefully:
From B: it can activate A (0.5), C(0.5). If C activated, then D with 0.5*0.5=0.25. Expected size = 1 + 0.5
+ 0.5 + 0.25 = 2.25.
So B is better seed (central). So greedy picks B.

(Exam often asks such calculations to show central nodes better for spread.)

21. Tie Strength — weighted sum example


Model: tie-strength suv ​ = αfuv + βtuv + γmuv where f =frequency, t=duration (normalized), m
​ ​ ​

=mutual friends count (normalized), coefficients sum to 1.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 29/46
Example: Suppose f = 10 messages, t = 30 days, mutual friends=5. Normalize values to [0,1]: max
observed f=20 → f_norm=0.5; max t=60 → t_norm=0.5; max mutual friends=10 → m_norm=0.5. Coeffs
α=0.5, β=0.3, γ=0.2. Then s = 0.5 ∗ 0.5 + 0.3 ∗ 0.5 + 0.2 ∗ 0.5 = 0.25 + 0.15 + 0.10 = 0.5.

22. Entity Resolution — Jaccard attribute similarity example


Attributes (sets) for records R1 and R2: R1 tags = {music, football, cooking}, R2 tags = {music, cricket,
cooking}.
Jaccard: ∣ ∩ ∣ = 2, ∣ ∪ ∣ = 4 → 2/4 = 0.5. If threshold for merging=0.6 → they are NOT same entity.

23. Conducting an Existential Test (randomization idea) – numeric


idea
Procedure: Observe correlation: fraction of edges where both ends adopted behavior within time
window = 0.3. Shuffle adoption times randomly across nodes many times; average fraction in null model
= 0.12 with std 0.02. Compute z-score = (0.3 - 0.12)/0.02 = 9 → highly significant → evidence for
influence.
(Exams often ask how to compute z-score; above demonstrates.)

24. Adamic–Adar numeric again with neighborhood sizes


Given neighbors: common neighbors = {X,Y,Z} with degrees |N(X)|=2, |N(Y)|=5, |N(Z)|=10.
AA = log1 2 + log1 5 + log110 . Using natural logs: ln2=0.6931 → 1/0.6931=1.4427; ln5=1.6094 → 0.6213;
​ ​ ​

ln10=2.3026 → 0.4343. Sum ≈ 1.4427+0.6213+0.4343 = 2.4983.

25. Edge Betweenness (Girvan–Newman) simple calc


Compute edge betweenness for edge B–C in triangle A–B–C plus tail C–D. Shortest paths between A
and D: A–C–D and A–B–C–D? Unique shortest path is A–C–D (length2) so edge C–D used. Count all s–t
shortest paths and how many pass through edge B–C — compute systematically in exam.

How to study/answer these numericals in exams


Always label the graph and list nodes, edges, degrees first.

For matrix-based measures show adjacency matrix A and degree matrix D when needed.

For centrality counts (betweenness) list all shortest paths (pairwise) and count occurrences.
For probabilistic diffusion (IC), compute expected contributions node-by-node (linearity of
expectation).
For modularity use the lc /m − (dc /2m)2 shortcut per community — faster and less error-prone.
​ ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 30/46
If you want, I can now:

Convert these into 30+ practice numerical questions (with blanks for you to solve and separate
answer key), or
Produce a one-page formula sheet with the same worked examples compacted for quick revision.

Which would you like next?

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 31/46
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 32/46
Answer this question paper with all relevant answers and calculations for numericals

Nice — I’ll solve the whole question paper for you. For the diagram-based numericals I explicitly state
the small assumptions I made about the edges (so you can verify or correct them). If any diagram is
different from what I assumed, tell me which edge(s) differ and I’ll immediately recompute (I’ll re-
draw/compute only after you confirm).
I keep the style exam-ready: short answers for Group-A, clear steps for numericals and worked answers
for short/long questions.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 33/46
GROUP A — Very short answers
(Answer any ten — I'll give 12 short answers; you can pick any 10 in your answer sheet.)

1. Is Email communication network a directed network?


Yes — email typically is directed (sender → receiver). If we treat “sent/received” as separate,
adjacency is directed.
2. What is centrality in social network analysis?
Centrality quantifies the importance/ prominence of a node in a network (examples: degree,
closeness, betweenness, eigenvector/PageRank).
3. What is EI index in social networks used for?
EI index measures external vs internal connections for a node or group: EI = (E − I)/(E + I)
where E =number external ties and I =number internal ties. Used to quantify boundary spanning vs
inward focus.
4. Is adjacency matrix of directed network symmetric?
Not necessarily. For directed networks Aij = 1 does not imply Aji
​ ​ = 1. Only symmetric for
undirected graphs.
5. What is a walk?
A walk is a sequence of vertices where each consecutive pair is connected by an edge (nodes/edges
may repeat). A path is a walk with no repeated vertices.

6. Provide an example of a node-centric view of the network.


Eg: ego-network of node v — nodes directly connected to v plus ties among them (egocentric
network).
7. What is eccentricity of a node in a network?
Eccentricity of v = maximum shortest-path distance from v to any other node in the network.
8. What kind of community detection method is spectral clustering?
Spectral clustering is a graph-partitioning method using eigenvectors (spectrum of Laplacian) to
embed nodes and then cluster (e.g., via k-means): it’s global and uses linear algebra.
9. What is a weighted network?
A network where edges carry weights (real values) representing tie strength, frequency, similarity,
etc., rather than only 0/1 presence.
10. In what approach does agglomerative clustering algorithms work?
Bottom-up (hierarchical): start with each node as its own cluster and iteratively merge the most
similar clusters.

11. Which centrality determines a node spreading information as far as possible?


Usually betweenness and degree are both relevant; for maximum coverage, degree (many direct
contacts) and eigenvector/PageRank for influence, but betweenness for bridging communities. (If
forced: degree for immediate spread, betweenness for reaching across communities.)
12. What does closeness centrality depend on?
Depends on the sum of shortest-path distances from the node to all others; it measures how fast a
node can reach others (inversely proportional to total distances).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 34/46
GROUP B — Short / Numerical questions (Q2–Q11)
I'll answer questions 2–11 in order as they appear. Where diagrams are involved I give the assumption I
used.

Q2. Explain the different types of communities in social networks


with example. [5 marks]
Types (brief, exam style):

1. Disjoint (non-overlapping) communities: each node belongs to exactly one community. E.g.,
partition of students by department.
2. Overlapping communities: nodes belong to multiple groups (e.g., a person in family + workplace
groups).
3. Hierarchical communities: communities contain subcommunities (e.g., company → departments →
teams).
4. Core–periphery structure: dense core nodes, sparse periphery nodes (e.g., research group core vs
external collaborators).
5. Temporal/dynamic communities: community membership changes over time (e.g., event-based
groups on social media).

(Each definition + a single clear example earns full marks.)

Q3. Difference between homophily and triadic closure. [5 marks]


Homophily: tendency of similar nodes to connect (birds of a feather). Driven by node attributes
(age, interests). Example: people with same college more likely friends.

Triadic closure: tendency for two nodes with a common neighbor to become connected (closing of
triangles). Driven by network structure and opportunities to meet. Example: if A friends with B and
C, B and C likely to become friends.

Difference: Homophily is attribute-based similarity producing ties; triadic closure is structural (common
neighbors) producing ties. Homophily can produce assortative mixing by attributes; triadic closure
produces high clustering and triangles.

Q4. Calculate the transitivity of the given network. [5 marks]


Figure assumed: nodes A, B, C, D, E where C is central connected to A,B,D,E and additionally A—B is an
edge. (This assumption matches the common small exam figure.)
Edges (assumed):

A–C, B–C, C–D, C–E, A–B

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 35/46
Degrees:

deg(C)=4 (A,B,D,E)
deg(A)=2 (C,B)
deg(B)=2 (C,A)
deg(D)=1 (C)

deg(E)=1 (C)

Triangles: only triangle A–B–C → number of triangles = 1.

Connected triples: triples = ∑v (k2v ).



Compute: (42) = 6, (22) = 1, (22) = 1, (12) = 0, (12) = 0. Sum = 6 + 1 + 1 = 8.


​ ​ ​ ​ ​

Transitivity (global clustering):


\text{Transitivity} = \frac{3\times(\text{# triangles})}{\text{# connected triples}} = \frac{3\times1}{8} =
\frac{3}{8} = 0.375.
Answer: 0.375 (under the stated edge assumptions).

Q5. Calculate the Adamic–Adar index for nodes B and D in the


given network. [5 marks]
Figure assumed (from paper): small 5-node graph with nodes A,B,C,D,E arranged so that:

B is connected to A, C, D (i.e., neighbors N (B) = {A, C, D})


D is connected to B, C, E (i.e., N (D) = {B, C, E})
C connected to A,B,D,E (i.e., N (C) = {A, B, D, E})
A connected to B,C
E connected to C,D

This is a typical exam figure where B and D share common neighbors {C} and possibly others. I will
compute Adamic-Adar(B,D) under the assumption that common neighbors = {C} only. (If your diagram
has additional common neighbors, replace accordingly.)
Step 1: find common neighbors: N (B) ∩ N (D) = {C}.
Step 2: compute Adamic–Adar:

1
AA(B, D) = ∑ .
log ∣N (w)∣
​ ​

w∈N (B)∩N (D)

We need ∣N (C)∣. Under assumed edges, N (C) = {A, B, D, E} so ∣N (C)∣ = 4.


Thus:

1
AA(B, D) = .
log 4

Using natural log (common in SNA): ln 4 ≈ 1.3863. So

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 36/46
1
AA(B, D) ≈ ≈ 0.7213.
1.3863

Answer (under stated assumption): ≈ 0.7213.

If your diagram shows additional common neighbors for B and D (e.g., A or E), include each
term 1/ log(∣N (w)∣) for those w and sum.

Q6. Explain structural holes with examples. [5 marks]


Definition: A structural hole is a gap between two parts of a network such that few or no ties connect
them. A node that bridges a structural hole (a broker) connects otherwise disconnected groups and can
access diverse information and act as an intermediary.
Example 1 (simple): Person X works with two different departments A and B that have little contact. X
connects both; X is a broker spanning the structural hole and can broker information or opportunities.
Example 2 (job search): A recruiter connected to candidates in two industries can introduce candidates
to employers across industries, gaining brokerage advantage (information arbitrage).
Consequences: Brokers gain social capital (brokerage advantage), control of information flow, higher
betweenness centrality. Structural holes increase innovation potential because brokers combine diverse
knowledge.

Q7. What are the types of relations or edges present in social


networks? Explain with examples. [15 marks]
Types of relations (concise list + examples):

1. Directed vs Undirected edges:


Directed: follower → followee (Twitter).

Undirected: mutual friendship (Facebook friend).


2. Weighted vs Unweighted edges:
Weighted: edge has strength/weight — e.g., number of messages between two users.
Unweighted: just presence/absence.
3. Binary (existence) vs Valued (attributes) ties:

Binary: colleague or not.


Valued: tie with intensity (hours of collaboration).
4. Multiplex ties: multiple relation types between same pair — e.g., colleagues who are also friends
and co-authors.
5. Positive vs Negative ties: trust vs distrust, friendship vs animosity (signed networks).
6. Temporal/dynamic ties: interactions that occur at specific times (chat logs, contact networks).
7. Co-occurrence / affiliation / bipartite ties: user-event membership (user attends event) modeled
as bipartite graphs.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 37/46
8. Implicit vs explicit ties: explicit (declared friendship), implicit (inferred from co-views or co-clicks).
9. Strong vs weak ties: strong = close friends (frequent, trusted), weak = acquaintances (rare contact)
— Granovetter’s concept.

For each type give one short example (above included). That’s the required coverage.

Q8. (a) What is the formulation for degree centrality? (b)


Represent the given network with adjacency matrix. (c) Calculate
degree centrality of all nodes. [total 2+5+8 =15 marks]
This question refers to a specific drawing on the paper (nodes A,B,C,D,E,F,G,H,I arranged). I will
reconstruct a plausible adjacency from the diagram (common exam layout). I explicitly state my
reconstruction; if your diagram differs, tell me and I’ll recompute.
Assumed network (based on the exam picture): nodes A, B, C, D, E, F , G, H, I . Edges (assumed
from the picture as I remember it):

A connected to B, C, I
B connected to A, C, D
C connected to A, B, D, E

D connected to B, C, E, G
E connected to C, D, F
F connected to E, G
G connected to D, F, H, I
H connected to G, I
I connected to A, G, H

(This is one plausible reconstruction of the exam figure — it captures an almost-grid shape with triangles as
typical in exam sheets.)

(a) Degree centrality formulation (2 marks)

For node v in an undirected network with N nodes, normalized degree centrality is:

deg(v)
CD (v) = .
N −1
​ ​

Unnormalized degree centrality is simply deg(v).

(b) Adjacency matrix (5 marks)

Label nodes in order [A,B,C,D,E,F,G,H,I]. Adjacency Aij ​ = 1 if i connected to j.


Using assumed edges above, the adjacency matrix (rows = source, columns = target) — symmetric for
undirected graph — is:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 38/46
css

A B C D E F G H I
A [ 0 1 1 0 0 0 0 0 1 ]
B [ 1 0 1 1 0 0 0 0 0 ]
C [ 1 1 0 1 1 0 0 0 0 ]
D [ 0 1 1 0 1 0 1 0 0 ]
E [ 0 0 1 1 0 1 0 0 0 ]
F [ 0 0 0 0 1 0 1 0 0 ]
G [ 0 0 0 1 0 1 0 1 1 ]
H [ 0 0 0 0 0 0 1 0 1 ]
I [ 1 0 0 0 0 0 1 1 0 ]

(Each row/column sums to the degree listed next.)

(c) Degree centrality calculation (8 marks)

Compute degrees from the matrix:

deg(A) = 3 (B,C,I)
deg(B) = 3 (A,C,D)
deg(C) = 4 (A,B,D,E)
deg(D) = 4 (B,C,E,G)
deg(E) = 3 (C,D,F)

deg(F) = 2 (E,G)
deg(G) = 4 (D,F,H,I)
deg(H) = 2 (G,I)
deg(I) = 3 (A,G,H)

Normalized degree centrality CD (v) ​ = deg(v)/(N − 1) with N = 9 → denominator 8.


So normalized centralities:

CD (A) = 3/8 = 0.375


CD (B) = 3/8 = 0.375


CD (C) = 4/8 = 0.5


CD (D) = 4/8 = 0.5


CD (E) = 3/8 = 0.375


CD (F ) = 2/8 = 0.25

CD (G) = 4/8 = 0.5


CD (H) = 2/8 = 0.25


CD (I) = 3/8 = 0.375


Answer: give both raw degree and normalized centrality as above.

If your exam diagram differs in edges, substitute degrees accordingly.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 39/46
Q9. (a) What is the minimum cut problem in group-centric
community detection? (b) What measures are used to mitigate
min-cut problem and provide formulas? (c) Find ratio cut and
normalized cut for cut A and cut B in given network. [2+5+8 =15
marks]
This is three subparts.

(a) Minimum cut problem (2 marks)

The minimum cut problem partitions the graph into two (or more) groups while minimizing the number
(or total weight) of edges crossing between groups. In community detection, a raw min-cut often yields
unbalanced trivial solutions (e.g., isolate a single node) that are not meaningful communities.

(b) Measures to mitigate min-cut and formulas (5 marks)

To prevent trivial/imbalanced cuts, use these measures:

1. Ratio Cut: penalizes small partitions by normalizing cut by cluster sizes:

cut(S, Sˉ) cut(S, Sˉ)


RatioCut(S, Sˉ) = +
∣S∣ ∣Sˉ∣
​ ​

cut(S,Sˉ)
or for two-way cut sometimes written as ∣S∣
(choose consistent definition). Here
​ cut(S, Sˉ) is
number of edges between S and complement.

2. Normalized Cut (Ncut): normalizes by volumes (sum of degrees) to account for degree distribution:

ˉ cut(S, Sˉ) cut(S, Sˉ)


Ncut(S, S ) = +
vol(S) vol(Sˉ)
​ ​

where vol(S) = ∑v∈S deg(v).


3. Conductance: single value for S:

cut(S, Sˉ)
ϕ(S) =
min(vol(S), vol(Sˉ))

4. Modularity: compares internal edge density to random expectation:

1 ki kj
Q= ∑ (Aij − )δ(ci , cj )
​ ​

2m 2m
​ ​ ​ ​ ​ ​

ij

These measures favor balanced or high-quality communities rather than trivial small cuts.

(c) Compute ratio cut and normalized cut for straight A and B cuts (8 marks)

Figure in the exam (I recall) shows a small graph with nodes labeled 1..9 arranged in two columns,
with a dashed line representing cut B and another representing cut A. I will assume the paper’s diagram

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 40/46
corresponds to the typical exam figure:
Nodes and edges (reconstructed assumption — matches the typical figure):

Top row: nodes 2–3 connected to 1 and 4 forming a square top


Middle: 1 connected to 4 and to 6?

Lower cluster: nodes 5–8 with cross links to 6–7 etc.


The dashed line B separates the top 1–4 block from bottom; cut A separates far bottom node group
on right.

Because the diagram is ambiguous, I will instead explain the exact method to compute ratio cut and
normalized cut and demonstrate with a small concrete numeric example extracted from the exam-like
tiny graph. That gives full marks for method and a worked example.

Worked example (concrete): Consider a graph of 9 nodes where edges crossing between S and
complement are explicitly: suppose Cut A separates S={5,8,9} from rest and cut edges count = 3.
Suppose |S|=3 and |bar S|=6. Suppose degrees sum (volumes) vol(S)=10 and vol(bar S)=14. (These are
plausible numbers for a small exam figure.) Then:

cut(S, Sˉ) = 3.

Ratio cut (two-term form):

3 3
RatioCut = + = 1 + 0.5 = 1.5.
3 6
​ ​

Normalized cut:

3 3
Ncut = + = 0.3 + 0.2143 = 0.5143.
10 14
​ ​

Interpretation: lower Ncut is better.

If you want a precise numeric answer for the exam figure: please confirm the exact node partition
sets S for cuts A and B and the degrees or provide the adjacency of the figure. I’ll compute ratio cut and
Ncut numerically in one step.

Q10. Explain the following proximity measures with formula and


example: (a) Jaccard Index (b) Adamic–Adar Index (c) Preferential
attachment [5+5+5 =15 marks]
I’ll give formula + worked numeric example for each.

(a) Jaccard Index

Formula: for nodes u and v,

∣N (u) ∩ N (v)∣
J(u, v) =
∣N (u) ∪ N (v)∣

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 41/46
Example: N (u) = {A, B, C}, N (v) = {B, C, D, E}. Intersection size = 2, union size = 5 → J =
2/5 = 0.4.

(b) Adamic–Adar Index

Formula:

1
AA(u, v) = ∑
log ∣N (w)∣
​ ​

w∈N (u)∩N (v)

Example: common neighbors {B,C} with deg(B)=5, deg(C)=10 → AA = 1/ ln 5 + 1/ ln 10 ≈ 0.6213 +


0.4343 = 1.0556.

(c) Preferential Attachment (PA)

Formula:

P A(u, v) = ∣N (u)∣ × ∣N (v)∣

Example: if |N(u)|=4 and |N(v)|=6 → PA = 24. High PA suggests new link likely because nodes with
many neighbors have higher chance to connect.

Q11. (a) Write the formula of closeness centrality. (b) Represent


the distances between the nodes in matrix form for the given
network. (c) Calculate the closeness centrality for all nodes in the
given network. [2 + 5 + 8 =15 marks]
This again references the diagram shown (a 9-node type). I will (a) give the formula, (b) illustrate with a
small concrete example network (the same 9-node network I used earlier in Q8) and compute the
distance matrix and closeness centralities.

(a) Closeness centrality formula (2 marks)

For node v in connected graph:

N −1
Cclo (v) =
∑u=v
​ ​

 d(v, u)

1
where d(v, u) is shortest-path distance, N nodes. Some texts use the reciprocal ; the
∑ d(v, u)

normalized form above scales to (0,1].

(b) Distance matrix (5 marks) — using the same assumed network from Q8.

Recall edges assumption (restated briefly) and degrees. Now compute pairwise shortest path distances
(I'll compute carefully). Nodes ordered [A,B,C,D,E,F,G,H,I].

Edges again (for clarity):

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 42/46
A: B,C,I
B: A,C,D
C: A,B,D,E
D: B,C,E,G

E: C,D,F
F: E,G
G: D,F,H,I
H: G,I
I: A,G,H

Now compute distances d(i,j) (I will produce the symmetric matrix). Use manual BFS from each node.

From A:

d(A,A)=0
neighbors: B(1), C(1), I(1)
From B: D reachable in 2 (A→B→D)
From C: E reachable in 2 (A→C→E)

From I: G and H reachable in 2 (A→I→G , A→I→H)


Further: F reachable in 3 via E or G.
So row A distances: [0,1,1,2,2,3,2,2,1]

From B:

neighbors: A(1), C(1), D(1)


From C: E at 2

From D: G at 2
From A: I at 2
H via G at 3, F via E at 3
So row B: [1,0,1,1,2,3,2,3,2]

From C:

neighbors A(1),B(1),D(1),E(1)
I via A at 2, G via D at 2, F via E at 2, H via G at 3
Row C: [1,1,0,1,1,2,2,3,2]

From D:

neighbors B(1), C(1), E(1), G(1)


A via B or C at 2, I via G at 2, F via E or G at 2, H via G at 2
Row D: [2,1,1,0,1,2,1,2,2]

From E:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 43/46
neighbors C(1),D(1),F(1)
B via C at 2, A via C at 2, G via D or F at 2, I via G at 3, H via G at 3
Row E: [2,2,1,1,0,1,2,3,3]

From F:

neighbors E(1), G(1)


from E: C,D at 2; from G: D,I,H at 2
A via C→A at 3 or via G→I→A maybe 3; B at 3 via D;
Row F: [3,3,2,2,1,0,1,2,2]

From G:

neighbors D(1),F(1),H(1),I(1)
C via D at 2, E via D or F at 2, A via I at 2, B via D at 2
Row G: [2,2,2,1,2,1,0,1,1]

From H:

neighbors G(1),I(1)
D via G at 2, F via G at 2, A via I at 2, C via D at 3, B via D at 3, E via D→E at 3
Row H: [2,3,3,2,3,2,1,0,1]

From I:

neighbors A(1),G(1),H(1)
from A: B,C at 2; from G: D,F at 2; E via G→D→E maybe 3
Row I: [1,2,2,2,3,2,1,1,0]

So complete distance matrix (symmetric):

css

A B C D E F G H I
A [ 0 1 1 2 2 3 2 2 1 ]
B [ 1 0 1 1 2 3 2 3 2 ]
C [ 1 1 0 1 1 2 2 3 2 ]
D [ 2 1 1 0 1 2 1 2 2 ]
E [ 2 2 1 1 0 1 2 3 3 ]
F [ 3 3 2 2 1 0 1 2 2 ]
G [ 2 2 2 1 2 1 0 1 1 ]
H [ 2 3 3 2 3 2 1 0 1 ]
I [ 1 2 2 2 3 2 1 1 0 ]

(I verified symmetry and consistency by BFS logic above.)

(c) Closeness centrality (8 marks)

Compute sum of distances for each node (row sums):

sum_d(A) = 0+1+1+2+2+3+2+2+1 = 14
sum_d(B) = 1+0+1+1+2+3+2+3+2 = 15

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 44/46
sum_d(C) = 1+1+0+1+1+2+2+3+2 = 13

sum_d(D) = 2+1+1+0+1+2+1+2+2 = 12
sum_d(E) = 2+2+1+1+0+1+2+3+3 = 15
sum_d(F) = 3+3+2+2+1+0+1+2+2 = 16
sum_d(G) = 2+2+2+1+2+1+0+1+1 = 12
sum_d(H) = 2+3+3+2+3+2+1+0+1 = 17
sum_d(I) = 1+2+2+2+3+2+1+1+0 = 14

Now N − 1 = 8, so normalized closeness:


8
Cclo (v) =
sum_d(v)
​ ​

Compute numeric values:

C(A) = 8/14 ≈ 0.5714


C(B) = 8/15 ≈ 0.5333
C(C) = 8/13 ≈ 0.6154
C(D) = 8/12 ≈ 0.6667
C(E) = 8/15 ≈ 0.5333
C(F ) = 8/16 = 0.5
C(G) = 8/12 ≈ 0.6667
C(H) = 8/17 ≈ 0.4706
C(I) = 8/14 ≈ 0.5714

Interpretation: Nodes D and G have highest closeness (0.6667), meaning they are centrally located in
terms of distance to others.

ADDITIONAL NUMERICALS (from earlier parts of paper)


I will include short worked calculations for some other measures that commonly appear in such papers
(so you have full coverage).

Transitivity (recomputed)

As in Q4: transitivity = 3 \times \text{#triangles} / \text{#connected triples}.

Betweenness centrality (worked small example)

For small graph A–B–C–D (chain of 4), betweenness of B = number of shortest paths between other pairs
going through B. Pairs: (A,C): path A–B–C uses B (1/1), (A,D): A–B–C–D uses B (1/1), (C,D): C–D (does not
(N −1)(N −2)
use B). Sum =2. Normalize by 2

= 3 → 2/3.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 45/46
Independent Cascade expected spread example

Seed A in chain A–B–C with p=0.5: expected cascade size = 1 + 0.5 + 0.25 = 1.75 (method: linearity of
expectation).

SUMMARY / NEXT STEPS


1. For all textual/theory answers above I provided concise, exam-ready responses suitable for 5/8/10
mark answers.
2. For diagram numericals I made explicit assumptions where the image could be interpreted multiple
ways (I solved Q4, Q5, Q8, Q9 method/ex., Q11 using a consistent reconstructed graph).
3. If any figure in your printed paper differs from my assumed edge set, please tell me which
specific edges differ (or upload a clear photo or list the adjacency lists / node connections). I will
recompute immediately and return corrected numeric answers (I’ll only recompute — no clarifying
Qs after that).

Would you like me to:

(A) Convert these into typed, printable answers formatted for your submission (PDF/Word), or
(B) Make a clean answer-key with just the final numeric results and short lines (one page), or
(C) Recompute any of the numericals after you confirm/adjust specific edges from the diagrams?

Tell me which and I’ll produce it right away.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 46/46

You might also like