Homophily: Bringing Surrounding
Contexts into Network Analysis
Examine additional processes (to triadic closure) that
affect the formation of links in the network.
Surrounding contexts: factors that exist outside the nodes
and edges of a network.
Represent the contexts together with the network in a
common framework.
Homophily (i.e. “love of the same”)
Homophily principle: we tend to have similar
characteristics with our friends. People love those
who are like
themselves.
Similarity begets
friendship.
“birds of a feather flock together”
Expression appears in the 16th century, a literal
translation of Plato's Republic.
People of similar character, background, or taste tend to
congregate or associate with one another.
Homophily in Networks
Links in a social network tend to connect people who are
similar to one another.
basic notions governing the structure of social networks
Its role in modern sociology by influential work of
Lazarsfeld and Merton in the 1950s.
Paul Lazarsfeld and Robert K. Merton. 1954. Friendship as a
social process: A substantive and methodological analysis.
In Freedom and Control in Modern Society, pages 18–66.
Example
Social network from a town’s middle school and high school (students of
different races drawn as differently colored circles).
2 divisions:
• one based on race and
• the other based on friendships in the middle and high schools
Homophily vs. Triadic Closure
for Link Formation
With triadic closure:
–a new link is added for reasons that are intrinsic to the
network (need not look beyond the network)
–Ex: a friendship that forms because two people are
introduced through a common friend
With homophily:
a new link is added for reasons that are beyond the
network (at the contextual factors)
Ex: a friendship that forms because two people attend the
same school or work for the same company
Homophily vs. Triadic Closure
for Link Formation
Strong interactions between intrinsic and contextual effects
Both operating concurrently
Triadic closure (intrinsic mechanism):
B and C have a common friend A
B and C have increased opportunities to meet
Homophily (contextual mechanism):
B and C are each likely to be similar to A in a number of
dimensions
also possibly similar to each other as well
Most links arise from a combination of several mechanisms
difficult to attribute any individual link to a single mechanism
Measuring Homophily
Given a characteristic (like
race, or age), how to test if a
network exhibits homophily
according to it?
friendship network:
–Exhibits homophily by
gender?
–Boys tend to be friends with
boys, and girls tend to be
friendship network of a (hypothetical)
friends with girls. classroom: shaded nodes are girls and the six
–Cross-gender edges exist. unshaded nodes are boys
Measuring Homophily
Q: what would it mean for
a network not to exhibit
homophily by gender?
A number of cross-gender
edges not very different
from randomly assigning
each node a gender
according to the gender
balance in the original
network
Measuring Homophily
p, the probability (fraction) of males
q = 1-p, the probability (fraction) of females
For a given edge:
Homophily:
Prob(both ends male) = p*p
Prob(both ends female) = q*q
Cross gender:
Prob(ends male and female) = 2*p*q
Homophily Test: If the fraction of cross-gender edges is
significantly less than 2pq, then there is evidence for
homophily.
Measuring Homophily
p = 6/9 = 2/3
q = 1/3
2pq = 4/9 = 8/18
5/18 cross-gender edges
Test: 5/18 < 8/18 => some evidence of
homophily
Need definition of “significantly less than”
standard statistical significance
What if cross-gender edges more than
2pq?
inverse homophily or heterophily (Ex:
network of romantic relationships)
How to extend to characteristics with
more than 2 states?
Mechanisms Underlying Homophily
Homophily has two mechanisms for link creation
Selection: select friends with similar characteristics
Individual characteristics drive the formation of links
Involves immutable characteristics (determined at birth)
Social influence: modify behavior close to behaviors of
friends
the reverse of selection
involves mutable characteristics (behaviors, activities, interests,
beliefs, and opinions)
The Interplay of Selection and Social
Influence
Q: When homophily is observed, is it a result of selection
or social influence?
–Have people adapted their behaviors to become more like
their friends, or have they selected friends who were
already like them?
Track the network and monitor the results of the two
mechanisms
The Interplay of Selection and Social
Influence
Most of the times, both mechanisms apply and interact
with each other
Studies show that teenage friends are similar to each
other in their behaviors, and both selection and social
influence apply:
Teenagers seek social circles of people like them and peer
pressure causes conformation to behavioral patterns within
these circles
Q: how the two mechanisms interact and whether one is
more strongly at work than the other?
Affiliation
Story so far:
Homophily groups together similar nodes
Selection and social influence determine the formation of
links in a network
Similarity of nodes based on characteristics
How to model these characteristics?
They represent surrounding contexts of networks
They exist “outside” the network
How to put these contexts into the network itself?
Affiliation
Represent the set of activities a person takes part in (a
general view of “activity”)
E.g. part of a particular company, organization, frequenting
a particular place, hobby
Refer to activities as foci: “focal points” of social
interaction
Affiliation Networks
Affiliation network:
bipartite graph (two-modes)
nodes divided into 2 sets
no edges joining a pair of nodes
that belong to the same set
people affiliated with foci
Example
Anna participates in both of
the social foci on the right
Daniel participates in only one
Co-Evolution of Social and Affiliation
Networks
Social networks change over time
New friendship links are formed
Affiliation networks change over time
People become associated with new foci
Co-evolution reflects interplay between selection and
social influence.
2 people participate in a shared focus can become friends
If 2 people are friends, they can share their foci
How to represent co-evolution with a single network?
Social-affiliation networks
Social-affiliation
network contains:
a social network on
the people and
an affiliation network
on the people and
foci
Social-affiliation networks
In social-affiliation networks link
formation as a closure process
Several options for “closing” B-C
triadic closure: A, B, and C
represent a person (already
examined)
focal closure: B and C people, A
focus
selection: B links to similar C
(common focus)
membership closure: A and B
people, C focus
social influence: B links to C
influenced by A
Example
Bob introduces
Anna to Claire.
Karate “introduces”
Anna to Daniel.
Anna introduces
Bob to Karate.
Edges with bold are the newly formed
Tracking Link Formation in On-Line Data
What we’ve discussed so far: a set of mechanisms that
lead to the formation of links
triadic closure
focal closure
membership closure
Tracking these mechanisms in large populations
Their accumulation observable in the aggregate
Tracking Triadic Closure
Likelihood of link as a function of common friends?
1. Two snapshots of the network
2. For each k, find all pairs of nodes with k common friends
in the first snapshot, but not directly connected
3. T(k): fraction of these pairs connected in the second
snapshot
empirical estimate of probability that a link will form between
two people with k common friends
4. Plot T(k) as a function of k
T(0) is the rate of link formation when it does not close a triangle
Tracking Triadic Closure
Kossinets and Watts computed T(k)
full history of e-mail communication (“who-talks-to-whom”)
a one-year period
22,000 students at a large U.S. university
observations in each snapshot were one day apart (average
over multiple snapshots)
Gueorgi Kossinets and DuncanWatts. Empirical analysis of an
evolving social network. Science, 311:88–90, 2006.
Tracking Triadic Closure
Interpret the result compared to a baseline
Assume that each common friend that 2 people have,
gives them an independent probability p of forming a link
2 people have k friends in common => the probability they
fail to form a link is (1-p)^k
2 people have k friends in common => probability that they
form a link is 1-(1-p)^k
Tracking Triadic Closure
Tracking Focal Closure
Likelihood of link formation as a function of the number
of common foci?
Kossinets and Watts supplemented their university e-mail
dataset with information about the class schedules.
Each class became a focus.
Students shared a focus if they had taken a class together.
Tracking Focal Closure
Tracking Membership Closure
Blogging site
LiveJournal
social network
(friendship links)
foci correspond to
membership in
user-defined
communities
probability of joining a LiveJournal
community as a function of the number of
friends who are already members
Tracking Membership Closure
Wikipedia editors
link editors when
they communicated
(user talk page)
each Wikipedia
article defines a
focus (editor
associated with the
articles he/she
edited)
probability of editing a Wikipedia articles as a
function of the number of friends who have
already done so
Quantifying the Interplay Between
Selection and Social Influence
How selection and social influence work together to produce
homophily?
How do similarities in behavior between two Wikipedia
editors relate to their pattern of social interaction over
time?
Similarity between 2 Wikipedia editors A, B:
Is homophily (similarity) due to editors connected (talk) with
those edited the same articles (selection), or because editors
are led to edit articles by those they talk to (social influence)?
Quantifying the Interplay Between
Selection and Social Influence
“tick” in time whenever either A or B performs an action (editing or
talking). Time 0 is the point at which they first talked
Friendship Network on Facebook
Wimmer and Lewis (2010) studied friendship network of
an entire college cohort of 1,640 students.
Three measures of friendship: facebook friends, picture
friends, housing groups.
Exponential random graph modeling (ERGM) techniques
is used to disentangle the effects of the various tie-
generating mechanisms and identify the (multiple) levels
of ethnoracial categorization on which homophily actually
occurs.
Mechanisms leading to homogeneity
Friendship Network on Facebook
Main findings
Some racial categories matter for social network formation
(black) , but not others (Asian). Some ethnic categories (e.g.
South Asian, Jewish, Chinese, British) matter a lot, but not
others (e.g. Italian).
The failure to take into account opportunity structure and
difference in sociality leads to an overestimation of
homophily rate of large groups and groups of more sociable
individuals.
The ignorance of triadic closure leads to an overestimation
of the tendency towards homophily.
A Spatial Model of Segregation
One of the most strong
effects of homophily is
in the formation of
ethnically and racially
homogeneous
neighborhoods in cities
a process with a
dynamic aspect
what mechanisms?
In blocks colored yellow and orange the percentage of African-
Americans is below 25, while in blocks colored brown and black the
percentage is above 75
The Schelling Model
How global patterns of spatial segregation can arise from
the effect of homophily operating at a local level (Thomas
Schelling, 1969)
an intentionally simplified mechanism
works even when no one individual explicitly wants a
segregated outcome
The Schelling Model
Model assumptions:
Population of individuals called
agents
Each agent of type X or type O
The two types represent some
characteristic as basis for
homophily (race, ethnicity, country
of origin, or native language)
Agents reside in cells of a grid
(simple model of a 2-D city map)
Some cells contain agents while
others are unpopulated
Cell’s neighbors: cells that touch it
(including diagonal contact)
The Schelling Model
Cells are the nodes and edges connect neighboring cells.
We will continue with the geometric grid rather than the graph.
The Schelling Model
Local mechanism:
Each agent wants to
have at least some t
other agents of its own
type as neighbors (t
the same for all)
Unsatisfied agents
have fewer than t
neighbors of the same
type as itself and move
to a new cell
E.g. t = 3
The Dynamics of Movement
Unsatisfied agents move in rounds
consider unsatisfied agents in some
order
random or row-sweep
unsatisfied agents move to an
unoccupied cell where will be satisfied
random or to nearest cell that
satisfies them
may cause other agents to be
unsatisfied
deadlocks may appear (no cell that
satisfies)
stay or move randomly
All variations have similar results
E.g. t=3, one round, row-sweep, move to
nearest cell, stay when deadlocks
Larger examples
Two runs (50 rounds) of the Schelling model with unsatisfied agents moving
to a random location. Threshold t=3, 150-by-150 grid with 10, 000 agents.
Each cell of first type is red, of second type blue, or black if unoccupied.
Interpretations of the Model
Spatial segregation is taking
place even though no individual
agent is seeking it
Agents just want to be near t
others like them
When t=3, agents are satisfied
being minority among its
neighbors (5 neighbors of the
opposite type)
See the figure on right:
A checkerboard 4x4 pattern can
make all agent satisfied (even for
large grids).
We don’t see this result in
simulations.
Some other simulation results
t=4, 150-by-150
grid, 10, 000
agents, varying
number of rounds
(steps), not shown
until the end
Interpretations of the Results
More typically, agents form larger clusters
Agents become unsatisfied and attach to larger clusters
(where higher probability to be satisfied)
The overall effect:
Local preferences of individual agents have produced a
global pattern that none of them necessarily intended
The Schelling model is an example that, as homophily draws
people together along immutable characteristics (race or
ethnicity), it creates a natural tendency for mutable
characteristics (decision about where to live) to change in
accordance with the network structure
Summary
Homophily
Underlying Mechanisms: Selection and Influence
Affiliation
Tracking Network using On-Line Data
Email
Wikipedia editing
A Spatial Model of Segregation
Schelling Model