Sociological Foundations of Computational Social Science: Yoshimichi Sato Hiroki Takikawa
Sociological Foundations of Computational Social Science: Yoshimichi Sato Hiroki Takikawa
Yoshimichi Sato
Hiroki Takikawa Editors
Sociological
Foundations
of Computational
Social Science
Translational Systems Sciences
Volume 40
Editor-in-Chief
Kyoichi Kijima, School of Business Management, Bandung Institute of Technology,
Tokyo, Japan
Hiroshi Deguchi, Faculty of Commerce and Economics, Chiba University of
Commerce, Tokyo, Japan
Editors in Chief
***
• Kyoichi Kijima (Bandung institute of Technology)
• Hiroshi Deguchi (Chiba University of Commerce)
Editorial Board
• Shingo Takahashi (Waseda University)
• Hajime Kita (Kyoto University)
• Toshiyuki Kaneda (Nagoya Institute of Technology)
• Akira Tokuyasu (Hosei University)
• Koichiro Hioki (Shujitsu University)
• Yuji Aruka (Chuo University)
• Kenneth Bausch (Institute for 21st Century Agoras)
• Jim Spohrer (IBM Almaden Research Center)
• Wolfgang Hofkirchner (Vienna University of Technology)
• John Pourdehnad (University of Pennsylvania)
• Mike C. Jackson (University of Hull)
• Gary S. Metcalf (InterConnections, LLC)
• Marja Toivonen (VTT Technical Research Centre of Finland)
• Sachihiko Harashina(Chiba University of Commerce)
• Keiko Yamaki(Shujitsu University)
Yoshimichi Sato • Hiroki Takikawa
Editors
Sociological Foundations
of Computational Social
Science
Editors
Yoshimichi Sato Hiroki Takikawa
Faculty of Humanities Graduate School of Humanities and Sociology
Kyoto University of Advanced Science The University of Tokyo
Kyoto, Japan Bunkyo-ku, Tokyo, Japan
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Computational social science has not fully shown its power in sociology. This is the
motivation for us to publish this book. It is true that intriguing articles using
computational social science have been published in top journals of sociology, but
it is another story whether computational social science has a dominant influence in
sociology. To the best of our knowledge, it has not occupied a central position in
sociology yet. Why not? We tried to answer this question in this book.
Our answer is that computational social science has not attacked central issues in
sociology, meaning and interpretation in particular. If it gave answers to research
questions, which most sociologists deem important but conventional sociological
methods such as statistical analysis of social survey data have been unable to answer,
computational social science would be more influential in sociology. However, this
has not happened yet.
This issue stems from two reasons: First, computational social scientists are not
necessarily familiar with important sociological concepts and theories, so they tend
to begin with available digital data such as mobile data without seriously thinking
how analyzing the data contributes to the advancement of the concepts and theories.
Second, sociologists have not clearly formalized important concepts and theories so
that computational social scientists would find it easy to connect them with their
analysis and be able to substantively contribute to elaborating them.
In a sense, computational social science and sociology are in an unhappy relation.
If they efficiently collaborate with each other, sociology will reach a higher level.
This book proposes ways to promote the collaboration. Especially, we focus on
meaning and interpretation in several chapters and show how to incorporate them in
analysis using computational social science. This is because, as abovementioned,
they have been central issues in sociology. Thus, if they are properly incorporated in
computational social scientific analysis, the result of the analysis makes a quantum
leap in sociology, and sociologists realize the true power of computational social
science. As a result, computational social science and sociology will have a happy
marriage.
v
vi Preface
We hope that readers of the book find it important to realize the collaboration
between computational social science and sociology by ways proposed in the book
for sociology to jump to a higher stage with the help of computational social science.
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Yoshimichi Sato and Hiroki Takikawa
2 Sociological Foundations of Computational Social Science . . . . . . . . 11
Yoshimichi Sato
3 Methodological Contributions of Computational Social Science to
Sociology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Hiroki Takikawa and Sho Fujihara
4 Computational Social Science: A Complex Contagion . . . . . . . . . . . . 53
Michael W. Macy
5 Model of Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Hiroki Takikawa and Atsushi Ueshima
6 Sociological Meaning of Contagion . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Yoshimichi Sato
7 Polarization of Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa
8 Coda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Yoshimichi Sato and Hiroki Takikawa
ix
Chapter 1
Introduction
Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_1
2 Y. Sato and H. Takikawa
has an inverted U-shaped effect on the emergence of trust, cooperation, and the
global market. To check the validity of their assumption, they set up a model in
which some of the agents are assumed to move among local societies, become
newcomers, and interact with local people. Then, local people learn to trust and
cooperate with the newcomers and decide to leave their local societies entering the
global market. If the level of mobility is very low, local people do not have an
opportunity to interact with newcomers and, therefore, cannot learn to trust and
cooperate with them. If the level of mobility is very high, local societies become
unstable because most of the agents leave their local societies and become new-
comers, local people lose stable local societies in which they learn to trust and
cooperate with newcomers. Only if the level of mobility is moderate, local people
learn to trust and cooperate with newcomers and enter the global market, which leads
to the propagation of trust and cooperation in society and the emergence of the global
market.
In their model, the external environment is the level of mobility of agents, and the
social phenomena to be explained are the propagation of trust and cooperation and
the emergence of the global market at the societal level. Agents decide whether to
trust and cooperate with other agents and to enter the global market at the micro-
level. Then, their trusting behavior, cooperating behavior, and entering the global
market accumulate to the macro-level. As the result of the accumulation, trust and
cooperation propagate at the macro-level, and the global market emerges. In this
sense, Macy and Sato’s (2002) study is a textbook example of the micro–macro
linkage.
Big data (or digital data) analysis is the other main pillar of computational social
science.1 This technique has radically changed the way social scientific studies are
conducted. In conventional social scientific studies using empirical data, researchers
conduct social surveys to collect data they need for their studies. Social surveys have
various types: interview surveys, mail surveys, web surveys. What is common in any
type is that researchers design questionnaires to collect necessary information on
randomly selected respondents. For example, if a researcher wants to know the
relationship between social class and life satisfaction, he or she includes questions on
respondents’ occupation, education, income, and life satisfaction in the question-
naire and check whether positive relationship exists between class variables (occu-
pation, education, and income) and life satisfaction using correlation analysis and
cross-tabulation analysis. In a sense, social scientists actively collect necessary
information on respondents in social surveys.
In contrast, social scientists using big data collect information with limited
availability in most cases.2 For example, if we use mobility data of smartphone
users before and after a lockdown caused by the pandemic of COVID-19, we know
how the lockdown changed mobility patters of the users. However, because we
1
Big data and digital trace data are used interchangeably in this chapter.
2
Online experiments can collect necessary information on participants. Thus, the discussion here is
not applicable to online experiments.
1 Introduction 3
cannot collect the users’ opinions about the lockdown from the mobility data, we do
not know whether their attitudes toward the lockdown affect their mobility patterns.
Furthermore, because we cannot collect information on social classes of the users
from the mobility data, we do not know differential effects of the lockdown on the
users by social class.3
This incompleteness of available data (Salganik, 2018: Chapter 2) is one of the
serious problems of big data compared with social survey data. However, big data
has advantages compensating for this disadvantage. As Salganik (2018: Chapter 2)
points out, big data is literally big, always-on, and nonreactive. The bigness of big
data has several advantages. Comparing with social survey data, the most important
advantage is that results of analysis are robust and stable even if the data is divided
into many categories because each category has enough samples for analysis. For
example, when we create multiway tables using a social survey dataset, it often
happens that many cells do not have enough samples for robust analysis. Big data
analysis is exempt from this problem because we can collect as many samples as
possible. Theoretically, of course, there is an upper limit of the size of big data, but
our usual use of big data does not face this problem.
Always-on is also a strong advantage of big data. When social scientists want to
collect information on temporal change of characteristics of respondents such as
income, marital status, and life satisfaction using social surveys, they conduct
longitudinal data analysis. In the analysis they usually collect data on a regular
base, for example, every month or every year. Thus, in a sense, social survey data is
sporadically on, so we do not know temporal change of characteristics of respondent
between two survey waves. In contrast, big data such as mobile data and Twitter data
streams without break, so we can collect continuous time-series data on our target
population.
Nonreactivity of big data means that collecting big data does not affect behaviors
of the target population. For example, even if they know that their mobile data is
collected by their mobile phone company, they do not change their behaviors. They
commute to their firms, go to restaurants and bars, and exercise at fitness clubs as
usual. They tweet what they want to tweet. In contrast, respondents in social surveys
tend to give socially desirable answers to questions asked by interviewers in an
interview survey. For example, if a male respondent who supports sexual division of
labor is asked whether he supports it or not by a female interviewer, he probably
answers that he does not support it. This social desirability distorts distributions of
variables used in the social survey, but big data does not suffer from this problem.
So far, we observed advantageous characteristics of computational social science
focusing on agent-based modeling and big data analysis. Because of the character-
istics, computational social science has become influential and indispensable in
social science including sociology. However, it is another story whether it has
3
The study of the spread of the corona virus by Chang et al. (2021) is innovative because they
succeeded in combining mobility data and data on socioeconomic status and race to predict that the
infection rates are higher among disadvantaged racial and socioeconomic groups.
4 Y. Sato and H. Takikawa
answered core research questions in sociology. We argue that it has not necessarily
answered them yet because most of the studies using it do not properly include
meaning and interpretation of actors in their analysis. Meaning and interpretation
have been seriously studied by major sociologists since the early days of sociology.
For example, Max Weber, a founding father of sociology, argued that social action is
different from behavior (Weber, 1921–22). According to his conceptualization, a
behavior becomes a social action if the former has a meaning. In other words, a
behavior has to be interpreted and given a subjective meaning by the actor and other
actors (and the observer, that is, a sociologist observing the behavior) in order to
become a social action.
Since Max Weber, many giants in sociology focused on meaning and interpreta-
tion as the main theme of their sociological studies. They are Alfred Schütz (1932),
Peter Berger and Thomas Luckmann (1966), Herbert Blumer (1969), Erving
Goffman (1959), and George Herbert Mead (1934), to name a few. Their sociolog-
ical studies have attracted uncountable sociologists in the world. Thus, if it properly
deals with meaning and interpretation, computational social science will enter the
core in sociology and truly become indispensable to sociology. Furthermore, includ-
ing meaning and interpretation in computational social science will enhance its
analytical power. The purpose of this book exists here. That is, we try to explore
how computational social science and sociology should collaborate to advance
sociological studies as well as computational social science. Each chapter in this
book implicitly or explicitly shares this purpose from different perspectives.
Homophily is a critical factor for wide bridges to be built, but it may also create
polarization of beliefs, preferences, and opinions among people. People tend to be
clustered with those with similar beliefs, preferences, and opinions, which could lead
to polarization. To explore the mechanism creating polarization, Macy and his
collaborators conducted computer simulation using an agent-based model
(DellaPosta et al., 2015; Macy et al., 2021), laboratory experiments (Willer et al.,
2009), and online experiments (Macy et al., 2019) to study detailed mechanisms of
polarization.
Then, the second wave of computational social science came into the picture: Big
data analysis. Macy’s first study using big data was to examine the empirical validity
of Burt’s (1992) theory of structural holes at population level. Burt himself tested the
theory with data of entrepreneurs. In contrast, Macy and his collaborators used
telephone communication records in the UK to report that, as suggested by the
theory of structural holes, social network diversity calculated using the records is
strongly associated with socioeconomic advantages (Eagle et al., 2010). Another
study with big data analysis by Golder and Macy (2014) was to analyze tweets to
find what factors affect people’s emotion. Their study showed that the level of
happiness measured by positive and negative words is the highest when they wake
up, but the level declines as time passes. In other words, the level is affected by sleep
cycles.
The most excellent point of Macy’s study in computational social science is that
he starts his research with sociological theories and use techniques of computational
social science—agent-based modeling and big data analysis—to test them. That is,
theory comes first, and the techniques come second. Therefore, his research has
substantively advanced sociological theories.
Chapter 5 “Model of Meaning” by Hiroki Takikawa and Atsushi Ueshima
discusses the potential contribution of computational social science methods to
models of meaning and the elucidation of meaning-making mechanisms, which
are central issues in sociological theory. The authors point out that in conventional
sociology, issues of meaning have been examined almost exclusively through
qualitative approaches, and that the theoretical development of sociology as a
whole has been hindered by the lack of a quantitative model of meaning. In contrast,
the methods of computational social science, coupled with the availability of large-
scale textual data, have the potential to help build quantitative models of meaning.
With these prospects in mind, Takikawa and Ueshima firstly formulate meaning-
making as a computational problem and characterize it with three key points: the
predictive function of meaning, relationality of meaning, and semantic learning.
With this formulation, they discuss the utility of two computational linguistic
models—the topic model and the word-embedding model in terms of theories of
meaning. It is argued that the topic model and the word-embedding model have their
own advantages and disadvantages as models of meaning, and that integration of the
two is necessary. It is then pointed out that, in order to function as an effective model
of meaning for sociology, it is necessary to go beyond merely a computational
linguistics model for the representation of meaning and to clarify the mechanism that
links semantic representations to human action. They propose a model for this
1 Introduction 7
sociological theory, derive hypotheses from it, and check their empirical validity
using techniques of computational social science. The difference between this way
and conventional sociological research is that, in the former, computational social
science techniques can use data that conventional sociological methods were unable
to access and, therefore, analyze. The second way is that a sociologist should find
new patterns with the help of computational social science, generalize them to
hypothesis, and create a new theory to explain them. The key point of the second
way is that computational social science techniques find new patterns that could not
be found by conventional sociological methods, and such new patterns lead to new
theories. The chapter concludes that proper use of computational social science
opens a new door to upgrading sociology.
References
Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Berger, P. L., & Luckmann, T. A. (1966). The social construction of reality: A treatise in the
sociology of knowledge. Doubleday.
Blumer, H. G. (1969). Symbolic interactionism: Perspective and method. University of California
Press.
Burt, R. (1992). Structural holes: The social structure of competition. Harvard University Press.
Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113, 702–734.
Chang, S., et al. (2021). Mobility network models of COVID-19 explain inequities and inform
reopening. Nature, 589, 82–87.
Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press.
DellaPosta, D., Shi, Y., & Macy, M. (2015). Why do liberals drink lattes? American Journal of
Sociology, 120, 1473–1511.
Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science,
328, 1029–1031.
Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Macy, M. W. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s
dilemma. Rationality and Society, 1, 197–219.
Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Sciences of the United States of America, 99(Suppl_3),
7214–7220.
Macy, M. W., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A
computational model. American Sociological Review, 63, 638–660.
Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agent-
based modeling. Annual Review of Sociology, 28, 143–166.
Macy, M. W., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of
partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754
Macy, M. W., Ma, M., Tabin, D. R., Gao, J., & Szymanski, B. K. (2021). Polarization and tipping
points. Proceedings of the National Academy of Sciences, 118(50), e2102144118.
1 Introduction 9
Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University
of Chicago Press.
Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.
Salganik, M. J. (2018). Bit by bit. Princeton University Press.
Schütz, A. (1932). Der sinnhafte Aufbau der sozialen Welt: Eine Einleitung in die Verstehende
Soziologie. Springer.
Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment
processes, micromobilization, and movement participation. American Sociological Review,
51(4), 464–481.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Weber, M. (1921–22). Grundriß der Sozialökonomik, III. Abteilung, Wirtschaft und Gesellschaft,
Erster Teil, Kap. I. Verlag von J.C.B. Mohr.
Willer, R., Kuwabara, K., & Macy, M. W. (2009). The false enforcement of unpopular norms.
American Journal of Sociology, 115, 451–490.
Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic
and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling
study. Lancet, 395, 689–697.
Chapter 2
Sociological Foundations of Computational
Social Science
Yoshimichi Sato
Computational social science consisting of digital (big) data analysis and agent-
based modeling has become popular and influential in social science. Take Chang
et al. (2021), for example. They analyzed mobile phone data to simulate geograph-
ical mobility of 98 million people. One of their major findings is that social
inequality affects the infection rate. Their model predicts “higher infection rates
among disadvantaged racial and socioeconomic groups solely as the result of
differences in mobility” (Chang et al., 2021, p. 82). This finding is important and
meaningful to sociologists because one of the most important research topics in
sociology, social inequality, is studied from a new perspective with the help of
computational social science.
This chapter shows the gap between sociology and computational social science
and how to fill the gap. Chang et al. (2021) give us a good clue for it. I will get back
to this point later.
Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 11
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_2
12 Y. Sato
learning. Then they manipulate the mobility rate to study its effect on the formation
of trust, cooperation, and the global market. The mobility rate at the macro-level
affects the decision-making and behavior of agents at the micro-level, and then their
behavior accumulates and determines the level of trust, cooperation, and the global
market at the macro-level.
So far, we observed the strength of computational social science. It provides
powerful tools to social scientists, with which they can study new aspects of social
phenomena that cannot be analyzed by conventional methods. However, computa-
tional social science, I would argue, finds it difficult to properly study central
questions in sociology such as the emergence of social order and inequality because
it does not include meaning in its models (Goldberg & Stein, 2018).
Take the emergence of social order, for example. Social order has been a central
research topic in sociology [e.g., Parsons (1951) and Seiyama (1995)]. Scholars in
this research area would agree that coordinated behaviors among actors is a neces-
sary condition for social order but not a sufficient condition. Coordinated behaviors
must be justified by actors’ belief about them. Through their belief system, actors
observing coordinated behavior of other actors interpret that the behavior is socially
desired and that he/she should also behave in the same way. Here, as Weber (1921)
points out, behavior turns to be social action.
The abovementioned model of trust by Macy and Sato (2002) studies the
emergence of social order in a sense. The high level of trusting and cooperating
behavior and the emergence of the global market seem to be an example of the
emergence of social order. However, they are coordinated behavior of agents. For
the behavior to become social action and for social order to emerge, agents must
interpret the coordinated behavior as just behavior. However, agents in the model are
not equipped with a belief system through which they interpret behavior of other
agents. There is a gap between social order and coordinated behavior, and the model
cannot fill the gap.
Social inequality is also a central research topic in sociology, and one of the major
research questions in the field is why social inequality exists and persists. The
abovementioned study by Salganik et al. (2006) clearly demonstrates how social
inequality among songs emerges. However, it does not explain why the inequality
persists. In general, social inequality cannot be sustained without people’s belief that
it is just. It is true that powerful ruling class sustains inequality by suppressing people
in other classes, but this asymmetric, unjust system leads to social unrest making
society unstable and inefficient. Thus, for social inequality to persist, it is necessary
that people believe that it is just and accept it. Although it is an excellent study of
inequality, the study by Salganik et al. (2006) does not clearly show how and why
the inequality persists. Here again, we observe a gap between social inequality at the
behavioral level and people’s belief about it.
I do not mean to criticize Macy and Sato (2002) and Salganik et al. (2006) in
particular. I picked up them as examples of the studies in computational social
science to show that most of the studies in the field focus on behaviors not on belief.
This creates the abovementioned gap. There would be no problem if we are
interested only in behavioral patterns and conduct studies within computational
14 Y. Sato
social science. However, I argue that this strategy does not fully exploit the power
and potential of computational social science when it is applied in sociology. To do
this, we need to fill the gap.
How can we fill the gap? As abovementioned, we need to assume that actors have a
belief system through which they interpret reality and add meaning to it. This is not a
new assumption, though. Rather, this stands on sociological tradition. Berger and
Luckmann (1966), a classical book of social theory, argued that social “reality” does
not exit by itself. Rather, it is socially constructed. My interpretation of their theory
on the emergent of social reality is as follows. In the beginning, an actor performs a
behavior. Then, another actor observes it. However, he/she does not observe as it
is. he/she interprets it, adds meaning to it, and decides whether to adopt it or not. If
he/she adopts it, the third actor follows the same procedure. Then if many actors
follow the same procedure, the behavior turns to be a social action and social reality.
Goldberg and Stein (2018) adequately build an agent-based model based on
Berger and Luckmann’s (1966) theory of social construction of reality. Because
their model is an excellent example filling the gap, I will explain its details to show
how we can fill the gap.
Goldberg and Stein (2018) call their theory a theory of associative diffusion and
assume a two-stage transmission, which is different from simple contagion. In the
beginning of the two-stage transmission process, actor B observes that actor A
smokes. Then, actor B interprets the meaning of smoking and evaluates it in his/her
cognition system. Finally, actor B decides whether to smoke or not. If he/she decides
to smoke, actor A’s behavior (smoking) is transmitted to actor B.
Based on their theory, Goldberg and Stein (2018, pp. 907–908) propose the
following proposition:
Associative diffusion leads to the emergence of cultural differentiation even when agents
have unobstructed opportunity to observe one another. Social contagion does not lead to
cultural differentiation unless agents are structurally segregated.
Then, to check the validity of the proposition, they built an agent-based model
and conducted simulation to show that the proposition is valid. What makes their
model different from other models of contagion is that an agent in their model is
equipped with an associative matrix. The associative matrix shows the strength of
association between practices, which are exhibited by agents. In the model two
agents—agents A and B—are randomly selected. Agent B observes agent A
exhibiting practices i and j at certain probabilities that are proportional to agent
A’s preference on them. Then, B updates his/her associative matrix. In the update
process, the association between practices i and j becomes stronger. This is because
2 Sociological Foundations of Computational Social Science 15
he/she observed that agent A exhibited practices i and j simultaneously. Then, agent
B updates his/her preference over practices. Note that the update does not automat-
ically occur. Agent B calculates constraint satisfaction using the updated preference
and the associative matrix. If it is larger than constraint satisfaction using the old
preference and the associative matrix, he/she keeps the updated preference. Other-
wise, he/she keeps the old preference. Constraint satisfaction is a concept developed
in cognitive science and shows that an actor (agent) is satisfied if he/she resolves
cognitive dissonance.
Conducting simulation with this agent-based model, Goldberg and Stein (2018)
show that cultural differentiation emerges even if agents are not clustered in different
networks.
I highly evaluate their study because they started with important sociological
concepts such as meaning and interpretation. Agent-based modeling did not come
first. Rather, they started with some anecdotes, argued that conventional contagion
models cannot explain them, showed the importance of meaning and interpretation
to explain them, and built an agent-based model in which agents interpret other
agents’ behaviors and add meaning to them.
This is what I want to emphasize in this chapter. Many of sociological studies
using agent-based models, to my knowledge, apply the models to social phenomena
focusing on the behavioral aspect of agents, not on interpretive aspects of agents.
Take Schelling’s (1971) model of residential segregation, for example. As
abovementioned, his model demonstrates how agents’ decision-making at the
micro-level generates residential segregation at the macro-level. This is one of the
strengths of agent-based modeling. However, an agent in the model does not
interpret behaviors of his/her neighbors. He/she reacts to the rate of neighbors of
the same character as his/hers. He/she does not ask himself/herself why a neighbor
moves or stays; he/she does not add meanings to the behavior of the neighbor. In
other words, agents in the model are a kind of automated machine just reacting to the
composition of neighbors.
Even such a simple model explains the emergence of residential segregation.
However, an agent’s move can be interpreted in some ways [see also Gilbert (2005)].
If one of his/her neighbor interprets his/her move as a kind of “white flight,” he/she
might follow suit. In contrast, if the neighbor interprets the move differently, he/she
might stay, which would not lead to residential segregation. Thus, different inter-
pretations of agents’ behavior result in different outcome at the macro-level.
Take power for another example. Suppose that agent A does something, say
smoking, in front of agent B. If agent B thinks that he/she is implicitly forced to
smoke, too, he/she interprets that agent A exercises power on him/her. However, if
agent B interprets that agent A smokes just for his/her fun, he/she does not feel
forced to smoke. Thus, it depends on agent B’s interpretation of agent A’s behavior
whether power relation between agents A and B is realized or not (Lukes, 1974,
2005).
These examples clearly show that agents in agent-based modeling should be
equipped with an interpretation system. As pointed out, Goldberg and Stein (2018)
16 Y. Sato
Fig. 2.1 Relationship between following convention, choosing a new action, reflecting on the goal,
and discovering a new goal. (Source: Sato 2017, p. 42, Fig. 3.2)
propose an agent-based model with such a system and report interesting findings
based on the simulation of the model.
Squazzoni (2012, pp. 109–122) also argues the importance of social reflexivity,
which is closely related to the concepts of interpretation and meaning and cites two
studies done by him and his colleagues (Boero et al., 2004a, b, 2008). In the first
study (Boero et al., 2004a, b), firms in an industrial district are assumed to have one
of the four behavioral “attitudes” and to replace one to another in certain conditions.
In conventional agent-based models, agents are assumed to perform a behavior. In a
model in which agents play a prisoner’s dilemma game, for example, they cooperate
or defect. In contrast, an “attitude” is a bundle of strategies or behaviors. An agent
reflects his/her attitude and replaces it to another attitude if necessary. In the second
study (Boero et al., 2008), agents are placed in a toroidal 80 × 80 cell space and are
assumed to stay or move based on their happiness. One of the scenarios in their
model is that agents have one of the four “heuristics” and randomly change them
under a certain condition. A “heuristic” is a behavioral rule like an attitude in their
first study. Here again, agents reflect their heuristics and change them if necessary.
An important characteristic of the models is that agents choose not a behavior but
a rule of behaviors, which are called an attitude and a heuristic. An agent reflects the
rule they chose under certain conditions and replace it with another rule if necessary.
There is no interpretation process in the models. However, if the process is added to
the models, agents can be assumed to interpret the conditions where they are placed,
to reflect the current rule they chose, and to replace it to another rule if necessary.
Sato (2017) also points to the importance of meaning and reflexivity to fill the gap
between agent-based modeling and social theory. This is because social theories
often point to their importance in the study of modern and postmodern society
(Giddens, 1984; Imada, 1986; Luhmann, 1984). Thus, for agent-based modeling to
contribute the advance of social theory, it needs to incorporate meaning and reflex-
ivity in models. Sato (2017) revisits Imada’s (1986) triangular framework of action
(Fig. 2.1) and proposes a framework for agent-based modeling to incorporate
meaning and reflection. Imada’s framework consists of three types of action: Fol-
lowing convention, choosing a new action, and reflecting on the goal and
2 Sociological Foundations of Computational Social Science 17
discovering a new goal. Actors follow a convention as long as it does not cause a
problem. Borrowing the terminology of rational choice theory, I argue that
backward-looking rationality dominates in society to lighten the cognitive burden
of actors. However, if an external change occurs and the convention cannot solve
problems caused by the change, actors apply forward-looking rationality to the
change and find a new action which they believe will solve the problems. If the
new action actually solves the problem, it becomes a convention. In contrast, if it
cannot, actors reflect on the current goal and try to discover a new goal that they
believe would result in a better outcome. Then, if actors can find such a new goal and
an action that achieves it, the action becomes a new convention.
The key point of Imada’s framework when we incorporate it in agent-based
modeling is that actors find a new goal if they succeed. How can this process be
modeled? Logically, it is impossible because of the following reasoning. In Imada’s
framework actors find a goal that has not been found. Take the concept of “sustain-
able development,” for example. When advanced countries enjoyed economic
growth, their goal was only development without considering sustainability. As
people, governments, and other organizations realize social problems caused by
development, they invent the concept of “sustainable development” focusing both
on economic development and on sustainability. This example shows that the set of
goals is infinite. A goal is new because it has not been invented. Then, theoretically,
the set of goals must be infinite. However, creating an infinite set in agent-based
modeling is logically impossible. Then, how can we incorporate reflexivity in agent-
based modeling?
Sato (2017) proposes an assumption that agents have limited cognitive capacity;
they consider only a limited number of goals, which they have known (Set A in
Fig. 2.2). Then, set A is assumed to be included in a larger set, set B in Fig. 2.2.
Agents do not know goals in B-A. If a goal in B-A enters set A and a known goal
leave set A, the goal entering set A is new to agents. If agents interpret the new goal
is better than goals in set A, they will choose the new goal. If set B is large enough
18 Y. Sato
for simulation, set A always has goals new to agents. This could be a second-best
solution to the abovementioned problem.
In this subsection, I referred to studies by Goldberg and Stein (2018), Boero et al.
(2004a, b, 2008), and Sato (2017). Although their approaches are different from each
other, all of them try to incorporate meaning, interpretation, and reflexivity in agent-
based modeling. This line of research contributes to filling the gap between compu-
tational social science and sociology and advancing sociological studies with the
help of computational social science, a powerful tool in social science.
Digital data analysis is a rapidly growing body in sociology. For example, Chang
et al. (2021), which was mentioned in the beginning of this chapter, is an excellent
example of how digital data analysis advances sociological study of inequality. If the
sociological aspect of inequality among social groups had not been included in their
study, the study would not have been attractive to sociologists conducting conven-
tional research on social inequality. My point on their paper is that the sociological
research question on inequality among social groups led to digital data analysis
suitable for answering the question.
Another intriguing example of digital data analysis in sociology is the study on
newspaper coverage of U.S. government arts funding done by DiMaggio et al.
(2013). According to the authors, after a good relationship between National
Endowment for the Arts (NEA) and artists for two decades, the government support
for artists became contentious between the mid-1980s and mid-1990s. A good
indicator of the contention is a decline in NEA appropriations from 1986 through
1997. To explain the decline, the authors investigate press coverage of government
arts support. Their research question is “how did the press respond to, participate in
or contribute to the NEA’s political woes?” (DiMaggio et al., 2013, p. 573).
To answer the question, they applied Latent Dirichlet Allocation, a topic model of
text analysis, to newspaper articles in Houston Chronicle, the New York Times, the
Seattle Times, the Wall Street Journal, and the Washington Post published in the
abovementioned period. The topic model extracted 12 topics, and the authors
grouped them into three categories: (1) social or political conflict, (2) local projects
and revenues, and (3) specific arts genres, types of grant, or event information. After
detailed analysis of the topics and newspaper articles, the authors got the following
findings (DiMaggio et al., 2013, p. 602). (1) Press coverage of arts funding suddenly
changed from celebratory to contention in 1989, which continued in the 1990s.
(2) Negative coverage of the NEA emerged when George H.W. Bush was elected
president. (3) Press coverage reflected three frames for the controversy. (4) Press
coverage of government arts patronage differed from newspaper to newspaper.
As the authors emphasize, topic modeling is suitable for the study of culture
because it can clearly capture the relationality of meaning. Moreover, one of the
strong points of the paper is that it has a clear sociological research question with
2 Sociological Foundations of Computational Social Science 19
which they conducted topic modeling analysis. This became possible, maybe,
because Paul DiMaggio, one of the authors, has deep expertise in sociology of
culture.
These two exemplars of digital data analysis suggest that excellent sociological
study using digital data analysis should start with good research questions not with
available data. Digital data analysis is a social telescope (Golder & Macy, 2014) with
much higher resolution than conventional social surveys and experiments. However,
it is sociological expertise, not data itself, that finds order and pattern in data obtained
through the social telescope. Without such expertise and research questions based on
it, digital data analysis would not contribute to the advance of sociological inquiries.
In addition to sociological expertise, including meaning in digital data analysis
would make the analysis substantively contribute to sociological studies. As pointed
out in the previous subsection, meaning and interpretation are important concepts
when we conduct sociological inquiries. The abovementioned study of newspaper
articles by DiMaggio et al. (2013) is an excellent example of this approach. Take
Twitter data, for example. Suppose that actor A tweets a message supporting
politician X. It is not always the case that actor A expresses his/her true opinion in
the message. He/she may do so, but he/she may not because he/she hides his/her true
opinion expecting negative reactions of his/her Twitter followers. Then, actor B, a
follower of actor A, does not receive actor A’s message as it is. He/she interprets it
and tries to understand its meaning. Then, he/she tweets his/her message about actor
A’s original message based on his/her interpretation of it. He/she also may express
his/her opinion or may not. Other actors including actor A, in turn, read the message,
interpret it, and tweet a message, and so on.
To the best of my knowledge, most of the digital data analysis of Twitter data or
other text data lacks this interpretation/expression process. However, Twitter data
analysis with this process could unveil the relationship between actions (expressing
messages) and the inner world of actors, which would attract sociologists studying in
main domains in sociology. This is because interpretation and meaning have been
central concepts in sociology. Analysis of only behavioral digital data would not
enter the central segment in sociology. In contrast, analysis with the interpretation/
expression process would promote collaboration between sociologists and analysts
of digital data and contribute to the advance of sociology.
Radford and Joseph (2020) emphasize critical roles of social theory when we use
machine learning models to analyze digital data. Their coverage of topics is wide,
but, to summarize their argument, machine learning models are useless unless
researchers start with social theory in their research with machine learning models.
Study using machine learning models without social theory is like a voyage without
20 Y. Sato
a chart. It could not create hypotheses important in social science. We would not
know whether findings of the study are new. The crux of Radford and Joseph’s
(2020) argument is that not available data for machine learning models, but social
theory comes first.
The main message of this chapter is in line with theirs. For studies using
computational social science methods such as agent-based modeling and digital
data analysis to be fruitful and contribute to the advance of sociology, we should
start our research with sociological theories and concepts and create hypotheses
based on them. Then, computational social science methods help us rigorously test
their validity. And, most importantly, the methods may lead to new findings that
could not have been found using conventional sociological methodology. This is
highly plausible because the methods are new social telescopes with much higher
resolution than that of conventional methodology. Then, we must bring the findings
back to the original sociological theories and concepts, find their problems, and
invent new theories and concepts by fixing them. This is a way to advance sociology
with the help of computational social science and to have computational social
science substantively contribute to the advance of sociology. Furthermore, in this
way, computational social scientists could improve their methods so that improved
methods could be more suitable for sociological analysis. This means that sociology
contributes to the advance of computational social science. This collaboration would
advance both of sociology and computational social science and open a door to new
exciting interdisciplinary fields.
References
Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology
of knowledge. Doubleday.
Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the
industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In
G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems
(pp. 48–69). Springer.
Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro
technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial
Societies and Social Simulation, 7(2), 1.
Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties:
An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174.
Cederman, L.-E. (2005). Computational models of social forms: Advancing generative process
theory. American Journal of Sociology, 110(4), 864–893.
Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021).
Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589,
82–87.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41, 570–606.
Giddens, A. (1984). The constitution of society: Outline of the theory of structuration. Polity Press.
2 Sociological Foundations of Computational Social Science 21
Gilbert, N. (2005). When does social simulation need cognitive models? In R. Sun (Ed.), Cognition
and multi-agent interaction: From cognitive modeling to social simulation (pp. 428–432).
Cambridge University Press.
Gilbert, N. (2019). Agent-based models (2nd ed.). Sage Publications.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Imada, T. (1986). Self-organization: Revival of social theory. Keiso Shobo. (In Japanese).
Luhmann, N. (1984). Soziale Systeme: Grundriß einer allgemeinen Theorie. Suhrkamp.
Lukes, S. (1974). Power: A radical view. Macmillan Education.
Lukes, S. (2005). Power: A radical view (2nd ed.). Palgrave Macmillan.
Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Science, 99(Suppl. 3), 7214–7220.
Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agent-
based modeling. Annual Review of Sociology, 28, 143–166.
Parsons, T. (1951). The social system. Free Press.
Radford, J., & Joseph, K. (2020). Theory in, theory out: The uses of social theory in machine
learning for social science. Frontiers in Big Data, 3, 18. https://doi.org/10.3389/fdata.2020.
00018
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and
unpredictability in an artificial cultural market. Science, 311, 854–856.
Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social
theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.),
Reconstruction of the public sphere in the socially mediated age (pp. 37–46). Springer Nature
Singapore Pte.
Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1,
143–186.
Seiyama, K. (1995). Perspectives of theory of institution. Sobunsha. (In Japanese).
Squazzoni, F. (2012). Agent-based computational sociology. Wiley.
Weber, M. (1921). Soziologische Grundbegriffe. In Grundriß der Sozialökonomik, III. Abteilung,
Wirtschaft und Gesellschaft. J.C.B. Mohr.
Chapter 3
Methodological Contributions
of Computational Social Science
to Sociology
3.1 Introduction
H. Takikawa (✉)
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
S. Fujihara
Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 23
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_3
24 H. Takikawa and S. Fujihara
There are several possibilities for processing and analyzing such new data, but at
the center of these methods are machine learning methods (sometimes more broadly
referred to as “data science”) (Hastie et al., 2009; James et al., 2013; Jordan &
Mitchell, 2015). Machine learning differs substantially from traditional quantitative
methods used in the era of scarce data in its assumptions and “culture” (Grimmer
et al., 2021). Sometimes it is argued that machine learning techniques are not useful
for theory-oriented social sciences because they are not explanation-oriented and
often produce uninterpretable black boxes. One pundit even argued that theory was
unnecessary in the era of big data (cf. Anderson, 2008). In contrast, this chapter
argues that these techniques are potentially very useful, not only for conventional
sociological quantitative analysis but also for sociological theory building. To date,
various machine learning approaches have been used in computational social sci-
ence, but they have not necessarily been systematically utilized to contribute to
sociological (social science) theory. Conversely, many sociologists underestimate
the potential contribution of machine learning and computational social science to
sociological theory. To change this situation, we aim to identify how machine
learning methods can be used to contribute to theory building in sociology.
In the next section, after noting the differences between the culture of machine
learning and that of traditional statistics, we define the concept of machine learning
as used in this chapter. We then explain the basic ideas and logic of machine
learning, such as sample splitting and regularization. In Sect. 3.3, we summarize
three potential applications of machine learning in sociology: breaking away from
deductive models, establishing a normative cognitive paradigm, and using machine
learning as a model of human decision making and cognition. Section 3.4 summa-
rizes the statistical methods conventionally used in sociology, including description,
prediction, and causal inference, and describes their challenges, focusing on the use
of regression analysis. Sections 3.5 and 3.6 introduce current applications of
machine learning in sociology and discuss future directions.
Machine learning differs from the methodologies traditionally used in the social
sciences in terms of its assumptions, its conception of the desirability and value of
research objectives and methods, and, more broadly, its cognitive culture (Grimmer
et al., 2021) but that does not mean that its similarities with traditional statistics
should be underestimated (cf. Lundberg et al., 2021). Breiman (2001) described the
culture of traditional statistics as a data modeling culture and that of machine
learning as an algorithmic culture. The distinction between the two can be explained
as follows (Breiman, 2001). Consider that data are produced as a response y given an
input x. How x is transformed into y is initially a black box. In the data modeling
culture, we consider probabilistic models that model the contents of the black box,
3 Methodological Contributions of Computational Social Science to Sociology 25
Machine learning is an extremely broad term that covers a wide range of ideas and is
extremely difficult to define because it is used with slightly different meanings in
different fields. Molina and Garip (2019, p. 28) describe machine learning succinctly
as “machine learning (ML) seeks to automate discovery from data”; Grimmer et al.
(2021, p. 396) state that “machine learning is a class of flexible algorithmic and
statistical techniques for prediction and dimension reduction.” Athey (2018, p. 509)
states that machine learning is “a field that develops algorithms designed to be
applied to data sets, with the main areas of focus being prediction (regression),
classification, and clustering or grouping tasks.” Jordan and Mitchell (2015, p. 255)
describe machine learning as a field that addresses the following two questions.
1. “How can one construct computer systems that automatically improve through
experience?”
2. “What are the fundamental statistical-computational-information-theoretic laws
that govern all learning systems, including computers, humans, and
organizations?”
26 H. Takikawa and S. Fujihara
The latter theoretical question is not explicitly asked in the context of applications in
the social sciences, but it is sometimes necessary to ask such questions to think
deeply about why machine learning applications succeed or fail in any given
situation. Identifying the mechanisms by which humans and organizations learn is
also a challenge for sociological theory itself.
With the exception of the last theoretical question, the points can be summarized
as follows from these definitions:
1. Learning from data and experience, i.e., data-driven,
2. Aiming to improve the performance of task resolution, such as prediction and
classification, and
3. Aiming to develop algorithms that automate task resolution.
Overall, machine learning can be defined as a procedure for learning from data and
creating the best algorithm for automating the solution of tasks such as prediction
and classification.
Machine learning methods are often classified as supervised or unsupervised
(Bishop & Nasrabadi, 2006). In this section, we first introduce supervised machine
learning and then address unsupervised machine learning.
How, then, can we discover models with good generalization performance? For this
purpose, a procedure called sample splitting, which is the most important procedure
in supervised machine learning, is used (Hastie et al., 2009). In sample splitting, data
are divided into training data and test data. In the training data, pairs of (x, y) are
given, on which the model is trained. This is data for which the answer to which
x corresponds to which y is provided in advance, and in this sense, it is called
supervised learning. Test data, however, are used to evaluate the trained model.
Testing the predictive performance using test data is called out-of-sample testing. In
this way, models with good generalization performance can be selected. We consider
this process in more detail below.
In the training data, the goal is to minimize the loss defined by a loss function. For
continuous variables, the loss function can be, for example, the mean of the squared
value of the correct answer yi minus the predicted value yi or the mean squared error
(MSE). The formula for MSE is as follows:
n
1
MSE = ð yi - yi Þ 2 ,
n i=1
where n is the sample size in the data set. The smaller the loss is, the better the
model’s fit to the training data. The error defined on the training data is called the
training error. However, improving the fit to the training data is not in itself the goal
of machine learning models. The goal is to improve the prediction performance
given new data. For this purpose, test data other than the training data are prepared to
evaluate the prediction performance. The evaluation of test data is also based on the
MSE (for continuous variables), but it is called the test MSE to distinguish it from
training data. The error in the test data is called the test error or generalization error.
Thus, in supervised machine learning, the goal is to minimize the generalization
error, not the training error. Why not simply use all the data at hand as training data
and adopt the model with the best fit? A good fit to the training data does not
necessarily equate to good prediction performance for new, unseen data. This
phenomenon is called overfitting, where the model overreacts to random fluctuations
or noise in the training data set. When overfitting occurs, the variability of the model
increases due to differences in the training data, resulting in poor predictive perfor-
mance. Therefore, avoiding overfitting is of great practical importance in machine
learning (Domingos, 2012).
learning, can be decomposed into the variance of the prediction Var( f x ), the square
of the bias of the prediction [Bias( f x )]2, and the variance of the error Var(ε), where
f x represents the model prediction (James et al., 2013). The equation is as follows:
The variance of error Var(ε) is a characteristic of the true “world” and cannot be
reduced by any model. Therefore, the goal of optimization is to simultaneously
reduce Var( f x ) and Bias( f x ) of the predictions.
Bias is the difference between the true model and the learned model. For example,
if a simple linear regression model is used to approximate a complex nonlinear
relationship, the bias will be large. In contrast, variance is the amount by which
f x varies when using different data sets. A high variance would result in large
changes in the parameters of the model depending on the data set.
A tradeoff exists between bias and variance (Bishop & Nasrabadi, 2006; Yarkoni
& Westfall, 2017). In general, bias can be reduced by using complex models. For
example, suppose we use a polynomial regression model to improve the goodness of
fit for a curve. Compared to that of a first-order polynomial regression (linear
regression) model, the goodness of fit should increase monotonically as the number
of variables is increased to the second and third orders. However, overly complex
models have large variances because they are overfitted to a particular data set. In
other words, a small change in the data set causes a large fluctuation in the estimates.
Conversely, a simple model is less affected by differences in data sets, but it cannot
adequately fit the data at hand, resulting in a large bias. Therefore, a model with a
moderate complexity is preferred to avoid overfitting and reduce the variance while
still providing a certain degree of goodness of fit and lowering the bias.
Traditional social science is not concerned with the predictive performance of
models and has preferred models with low bias [although multilevel analysis, which
is heavily used in sociology, can actually be interpreted as an attempt to improve
model performance by introducing bias (Gelman & Hill, 2006)] (Yarkoni &
Westfall, 2017). In traditional methods, approaches such as splitting the data into
samples are almost never used, and practically only training data are used, aiming
solely to increase the goodness of fit of the model or to test the significance of a
particular regression coefficient. Improving only the fit to the training data in this
way may lower bias but cause overfitting, resulting in poor predictive performance
for unseen data (or the test data). In such procedures, there is no explicit step of
evaluating the performance of multiple models and making a choice. Some may note
that even in sociology, there are cases where performance is compared among
several different models and a selection is made. However, analytical methods
traditionally used for model selection, such as likelihood ratio tests, AIC, and BIC,
have various limitations and restrictions and may not sufficiently prevent overfitting.
For example, in regard to AIC, as the data size increases, the penalty is not sufficient,
and complex models are almost always preferred (Grimmer et al., 2021). In contrast,
the main difference is that the model selection procedures with sample splitting and
3 Methodological Contributions of Computational Social Science to Sociology 29
out-of-sample testing used in machine learning are empirical, i.e., data-based and
general-purpose selection procedures that can be applied to any model. This method
enables the evaluation of the predictive performance of any supervised machine
learning model.
Two points should be noted here. First, the training and testing phases must be
strictly separated. For example, if an out-of-sample test is conducted and the
prediction performance is low, the model is modified, and the out-of-sample test is
conducted again with the same test data, the model will be overfitted for the test data.
Rather than this, a portion of the training data should be spared for validation to
verify the prediction performance. The model’s hyperparameters are adjusted using
these data; then, the model trained using the training and validation data is tested on
the test data. In this case, the data are divided into training, validation, and test data
(Hastie et al., 2009).
Second, while the division into training and test data is a means of avoiding
overfitting, it also carries the risk of underfitting since saving data for testing reduces
the size of the data for training. An alternative way to maximize the use of the data at
hand is k-fold cross-validation (Hastie et al., 2009; Yarkoni & Westfall, 2017). It is
common to use five or tenfold, although this number can vary depending on the
complexity of the model. In this method, the data are randomly split into training and
test data k times. Then, k out-of-sample tests are performed, and the predictive
performance of the model is evaluated based on the average of the k out-of-sample
tests. In this method, all data are used as training data, which is more efficient than a
one-time split. However, it also has the problem that the computation time is k times
longer.
3.2.3.4 Regularization
While sample splitting is a means of detecting overfitting, it does not tell us what
kind of model to build to prevent overfitting itself. Regularization is a tool for
building models to avoid overfitting (Hastie et al., 2009; Yarkoni & Westfall, 2017).
Overfitting is more likely to occur when a model is complex. Regularization is
therefore a means to improve the predictive performance of a model by introducing
constraints on model complexity. The most commonly used approach is the idea of
penalizing for complexity. Specifically, a penalty term for model complexity is
introduced into the function that the model should optimize, in addition to loss
minimization. For example, in the case of the Lasso regression model (Tibshirani,
1996, 2011), the training is set up to minimize the squared error plus a penalty term
proportional to the sum of the absolute values of all regression coefficients. Gener-
ally, the more complex a model is, i.e., the more variables are used, the greater the
extent to which the squared error can be reduced; however, the more complex a
30 H. Takikawa and S. Fujihara
model is, the larger the sum of the absolute values of the regression coefficients
becomes. In other words, there is a tradeoff between minimizing the squared error
and minimizing the sum of the absolute values of the regression coefficients. The
objective of the Lasso regression model is to achieve an optimal balance between the
two. From the perspective of the bias–variance tradeoff, the model can be positioned
as an attempt to improve predictive performance by introducing a bias called a
penalty term. The sum of the absolute values of the penalty terms used in Lasso
regression is called the L1 norm. In contrast, the ridge regression model uses the
square root of the sum of the squares of the regression coefficients, or the Euclidean
norm, as the penalty term. This is also called the L2 norm. In general, the lasso
regression model has an incentive to set some regression coefficients to zero and has
a strong regularizing effect. Penalized regression is effective when the number of
predictors (explanatory variables) is large relative to the sample size. However, the
coefficient of the penalty term (hyperparameter), which determines how the penalty
works, must be adjusted appropriately, and for this purpose, the sample should be
divided into three parts.
The concept of regularization itself is more general and is not limited to the
addition of penalty terms in regression models. There are various other regulariza-
tion devices in supervised machine learning, such as early stops, which stop the
learning process early in artificial neural networks and deep learning, and dropouts,
which remove certain nodes in the learning process (Goodfellow et al., 2016).
Unsupervised machine learning aims to discover the latent structure of data x without
the correct answer label y. In other words, it involves reducing high-dimensional
data x to low-dimensional, interpretable quantities or categories. Unsupervised
machine learning includes cluster analysis such as k-means and hierarchical cluster-
ing, principal component analysis, and latent class analysis. In this sense, it is a
technique that has been used relatively often in the traditional social sciences. If we
reposition it in the context of machine learning, the following two points are
important.
First, the importance of ensuring the validity of classifications and patterns
discovered by unsupervised machine learning is emphasized. Although it is difficult
to assess the validity of unsupervised machine learning models, various methods
have been proposed, including human verification (Grimmer et al., 2022). Second,
applications to extremely high-dimensional data, such as text data, have been
advanced. In the past, survey data were not high-dimensional, but with ultrahigh-
dimensional data, such as text data, which contain tens or hundreds of thousands of
variables (e.g., vocabulary appearing in a corpus), the discovery of patterns by
3 Methodological Contributions of Computational Social Science to Sociology 31
Machine learning has great potential to transform the traditional way of conducting
social science research on various social science issues (Grimmer et al., 2021). Here,
we summarize the potential of machine learning in three key areas.
Traditional social science has taken the deductive model approach as its scientific
methodology (Grimmer et al., 2021; cf. McFarland et al., 2016), which, according to
Grimmer et al., is a suitable method for efficiently testing theories based on scarce
data. Deductive models require a carefully crafted theory prior to observing data.
Assuming the existence of such a theory, a hypothesis that is derived from the theory
and can be tested by the data is then established. Data are then obtained, and the
hypothesis is tested only once using the data. Deductive models are highly compat-
ible with data modeling because they assume a theoretical model of how the data
were generated in advance. However, such a deductive model has various
drawbacks.
First, deductive models often use data modeling approaches that are compatible
with hypothesis testing frameworks, especially linear regression models. This leads
to the exclusion of various other, more realistic, possible modeling possibilities. In
sociology, this inflexibility of linear regression models has traditionally been the
cause of the divide between sociological theory and empirical analysis
(Abbott, 1988).
Second, the deductive model assumes that the theory is available before the data
are observed and thus cannot address issues such as how to create and elaborate
concepts from data and how to discover new hypotheses from data (Grimmer et al.,
2021). Compared to qualitative research in sociology (Glaser & Strauss, 1967;
Tavory & Timmermans, 2014), traditional quantitative research does not explicitly
incorporate a theory building phase.
Third, although the deductive model also serves as a normative scientific meth-
odology, it is difficult to rigorously follow such procedures in actual social science
practice. As a result, social science hypotheses face major problems in terms of
replicability and generalizability. In practice, in the social sciences, it is inevitable to
reformulate theories and explore hypotheses by analyzing data. However, presenting
32 H. Takikawa and S. Fujihara
Watts (2014) suggests that machine learning should be the normative model for
social science methodology in place of the traditional hypothetico-deductive model.
Specifically, he argues that successful prediction should be the standard by which
social science is evaluated.
Many, if not most, sociologists agree that elucidating the causal mechanisms of
social phenomena is the ultimate goal of sociology. However, the normative episte-
mic model for elucidating causal mechanisms has traditionally been assumed to be
based on a hypothetico-deductive approach. In this model, the ideal is an experi-
mental method conducted on the basis of testable hypotheses about causation.
However, although the methods of computational social science have greatly
expanded the range of application of large-scale digital experiments, the types of
social phenomena that can be tested remain limited. Therefore, sociologists have
3 Methodological Contributions of Computational Social Science to Sociology 33
The third point is related to a view different from the previous two. Machine learning
and artificial intelligence provide algorithms to discover and recognize patterns from
data. Therefore, in a sense, machine learning can be said to simulate human
judgment and cognition.
The most obvious application of this aspect is automatic coding (Nelson et al.,
2021). Sociology requires conceptualizing, classifying, and coding a variety of data
and underlying events. When the number of data and events is small, it is possible for
humans to classify and code them all manually. However, when the amount of data
is large, this is impossible. Therefore, it is very beneficial for machines to recognize,
classify, and code patterns in the data instead of humans.
However, the simulation of judgment and cognition by machines and the classi-
fication and prediction based on that judgment and cognition have implications for
sociology that go beyond mere practical purposes. Sociology is the study of human
action, and it is essential to model how people judge, perceive, and categorize
situations to understand the mechanisms of action (Schutz & Luckmann, 1973). In
contrast, traditional analytical tools do not directly model people’s judgments and
cognition but rather model the consequences of actions and the aggregate distribu-
tion of outcomes. Therefore, the connection to sociological theory is only indirect.
34 H. Takikawa and S. Fujihara
Conversely, machine learning is more than just an analytical tool; it can relate
directly to sociological theory by modeling people’s judgments and cognitions
(cf. Foster, 2018). For example, the models of natural language processing discussed
in detail in Chap. 5 can be used not merely for the practical purpose of automatically
summarizing and labeling textual content but also to formalize the way humans
process textual meaning. This will be discussed thematically in Chap. 5.
Furthermore, comparing machine learning-based classification and pattern pre-
cipitation with human classification and pattern recognition may reveal features and
biases of human judgment (Kleinberg et al., 2015). This identification of features and
biases in comparison to machine judgments may lead to further theorizing of human
judgments, cognition, and decision making. Alternatively, machine learning can be
used to address the possibility that people are reading and acting on certain signals
from certain situations. For example, machine learning fictitious prediction methods
(Grimmer et al., 2021) are useful in considering the question of whether people can
read the social class or socioeconomic status of tweeters from their daily tweets
(Mizuno & Takikawa, 2022). That is, information related to the socioeconomic
status of tweeters is collected in advance through surveys and other means, and
the question of whether this information can be predicted from tweets alone is
examined. If machines can successfully predict socioeconomic attributes, it is highly
likely that people are reading such signals in their daily interactions, and if advanced
machine learning models cannot predict such signals at all, they may not exist for
humans either, or if they do, they may be very weak. Thus, machine learning models
can be used to examine how people understand and behave in terms of the actions
(tweets) of others. This is an analytical method that has intrinsic relevance to theory.
Sociology has utilized various statistical analysis methods to test sociological theo-
ries and hypotheses and to discover patterns and regularity in social phenomena. In
this section, we introduce the conventional statistical analysis of sociological studies
and the related problems. It is, however, difficult to cover all statistical methods in
sociology. To contrast the conventional approach with the methods of machine
learning, especially supervised learning, we narrow our focus to statistical methods
that analyze data from social surveys using regression models. These methods are
commonly used in sociology and other social sciences, such as economics and
political science.
Quantitative sociology often uses a variety of research methods, such as small
local surveys, large national representative surveys, and panel surveys, to gather data
about individuals and groups in societies. Although some sociological studies use
experimental methods, the primary approach is to observe and gather data about
individuals, groups, and societies through social surveys. Using these data, sociol-
ogists can quantify social phenomena, compare groups, estimate the size of the
association between variables or the “effect” under a particular model, and make
causal inferences. Through this process, sociologists aim to understand and interpret
social phenomena and explain the underlying mechanisms that produce patterns and
regularities in society. Data analysis can be divided into three main tasks: descrip-
tion, prediction, and causal inference (or causal prediction) (Berk, 2004; Hernán
et al., 2019). To investigate patterns and regularities and test theories and hypothe-
ses, sociologists often use generalized linear models, particularly linear regression
models, to describe the association among variables, predict a variable of interest,
and estimate the causal effect of a treatment on an outcome (Berk, 2004).
36 H. Takikawa and S. Fujihara
3.4.2 Description
3.4.3 Prediction
The second type of sociological data analysis involves constructing a model for
purely predicting y from x. Although similar to the first method of description above,
the predictive task is more concerned with the dependent variable y or algorithms for
predictions than with the independent variables x (Salganik et al., 2020). The focus is
not on the estimation and interpretation of the coefficients of the independent
3 Methodological Contributions of Computational Social Science to Sociology 37
variables x but instead on the predictive accuracy of y under the model with these
independent variables (Molina & Garip, 2019). For example, a model can be
constructed to predict which individuals are more likely to drop out of high school
or experience poverty. Machine learning, particularly supervised learning, has
played a significant role in this type of prediction. However, the application of
predictions to sociological studies has not been fully explored, which is one reason
machine learning methods have not been fully exploited in sociological research
compared to other social science studies and why machine learning methods have
not made substantial contributions to sociology, especially studies using data from
social surveys.
However, predictive tasks are considered essential and have significant implica-
tions for sociological research. We introduce sociological studies using prediction
and discuss its implications in Sect. 3.5.
Regression models are also standard tools in causal inference. While randomized
experiments are the gold-standard method for determining causal effects, causal
inference in sociology usually has to rely on observational data, especially for ethical
reasons. In causal inference, we typically establish a treatment variable (A) and an
outcome variable of interest (Y ), use the potential outcome framework to derive
causal estimands, and use a directed acyclic graph (DAG) to explain the data
generation process and how confounding may occur (Morgan & Winship, 2014).
Then, we specify confounding variables (L ) sufficient to identify the causal effects,
use confounding variables as controls in the regression model, and estimate the
coefficient of the treatment variable. Of course, there are many other methods for
estimating causal effects in addition to conventional regression analysis (matching,
inverse probability of treatment weighting, g-estimation, g-formula, etc.). For the
selection of covariates to be used as control variables, it is recommended that
(1) variables that affect one or both the treatment and the outcome are included,
(2) those that can be considered instrumental variables are excluded, and (3) those
that serve as proxies of unobserved variables that are common causes of the
treatment and the outcome are entered into the model as covariates (VanderWeele,
2019). To avoid discrepancy between the estimated regression coefficients and the
intended effect, we must consider what is a good control variable and what is a bad
control variable (Cinelli et al., 2021). The selection of variables does not aim to
increase the predictive or explanatory power of the model but to reduce or not
amplify bias. In this respect, regression analysis for causal inference differs signif-
icantly from that for description and prediction. The determination of which vari-
ables are sufficient to identify a causal effect must be made by a human based on
theory, prior research, domain knowledge, etc. Even if the data-generating process is
clarified and confounding variables to identify the causal effect are obtained, linear
regression analysis may not be appropriate for causal inference. This is because the
38 H. Takikawa and S. Fujihara
There are still few applications of machine learning in sociology (see also Molina &
Garip, 2019 for applications in sociology). In this section, we review specific
applications of machine learning by introducing the application of the prediction
framework and automated coding in sociology. In the next section, we address a
more advanced topic: the deployment potential of machine learning in relation to the
elucidation of causal mechanisms.
well for many subjects, with a small number of cases that cannot be predicted using
any model. The overall tendency is that the predictions work reasonably well for
many subjects, with a few cases that cannot be predicted using any model. This may
stimulate the development of sociological theory by asking questions such as why
some cases are not well predicted, what theory can explain the poor predictions, and
what information not included in the survey should be focused on if an explanation is
to be attempted [cf. This may stimulate theoretical development (cf. Garip, 2020)].
Additionally, although different from the present case, suppose that a complex
model involving a variety of higher-order interactions is somewhat successful in
making predictions. This model has good predictive performance but does not have
the easily interpretable structure of an uncomplicated regression model. Therefore,
the sociologist must reinterpret and understand the structure of the model to under-
stand why this complex model can predict social reality to some extent. This will
require the development of new sociological theories, which opens the possibility of
developing sociological theory in a different way through the task of prediction by
machine learning. In other words, sociological theory can be constructed in a way
that improves the interpretability of models through moderately complex models that
capture predictable aspects of the real world, rather than the overly complex, noise-
filled real world itself. This is also an idea that leads to the scientific regret
minimization method, which will be introduced later.
income and wealth. By doing so, they obtained data (X, y) where X is the cell phone
log data and y is the income or wealth level. This can be regarded as supervised data
with label y for X. By training the model with this labeled data (X, y) and applying it
to the remaining unlabeled “labeled” cell phone log data, they assigned income and
wealth information to all data. Of course, the accuracy of the model depends on how
it is constructed since the information is extrapolated from the cell phone log data,
except for the information actually asked in the survey.
Conversely, the problem of construct validity (Lazer, 2015) arises when linking
the given found data to the construct y. For example, if one compares the theoretical
construct of intelligence as measured by an established test, the Raven progressive
matrices test, or by the criterion of writing long sentences on Twitter, the former
could be considered much more valid (Salganik, 2018). While these issues always
arise with survey data, found data are not created for research, so how to conceptu-
alize the data becomes an even greater issue.
The problem of how to create sociologically meaningful concepts from found
data, which is not designed for research, is not unique to digital trace data. Tradi-
tional methods of content analysis using newspapers, books, and magazines address
the same problem (Krippendorff, 2004). Traditionally, in these fields, researchers
interpret texts (and sometimes photos and videos) and assign codes that represent
sociological concepts. Although the problem of construct validity remains with such
methods, it is highly likely that a certain degree of validity can be ensured by coding
based on flexible human interpretation. However, this method is extremely expen-
sive and has limited scalability. Therefore, the idea of replacing some or all manual
coding with automatic coding by machines has emerged based on today’s large-scale
digital trace data and found data. The question then arises as to how machine-
automated coding can satisfy construct validity.
One method of machine coding is partial automatic coding by supervised
machine learning. The main framework is the same as in amplified asking. We
have unlabeled found data X. This can be a newspaper article (Nelson et al., 2021) or
an image posted on a social networking site (Zhang & Pan, 2019). For a subset of
this X, a sociological construct y is first manually labeled by the researcher. The
model is trained and evaluated using the data set (X, y) created in this way. Once a
model with sufficient performance is trained, it can be used to automatically assign
codes to the remaining data sets.
Constructs in sociology can be very complex and multidimensional, for example,
populism, social capital, and inequality. Nelson et al. (2021) conducted experiments
to test the validity of measuring complex concepts in sociology in an automatic
coding framework. Specifically, they manually coded inequality and its subconcepts
and related concepts in news articles that may contain the concept of inequality and
then used them as yardsticks to examine the extent to which partially automatic
coding by supervised machine learning matches manual coding. The results show
that supervised machine learning is capable of coding with a good degree of validity,
with the F1 score, which is the harmonic mean of precision and recall, exceeding
the guideline of 70. More interestingly, however, is the possibility that examining
the idiosyncrasies and biases of machine coding may also lead to a rethinking of the
3 Methodological Contributions of Computational Social Science to Sociology 43
human manual coding framework. Nelson et al. note that it is important to consider
the extent to which theoretically interesting categories can be categorized in terms of
precision and recall and to choose a coding framework based on this. To extend this
point further, it may be necessary to review the coding rules themselves so that
machines can construct theoretically interesting categories that are easier to classify.
As noted earlier, machine learning models are also formalized models of human
cognition and judgment, so the fact that they have a reflexive relationship to human
coding is particularly important when applied to complex, “socially constructed”
concepts handled by sociology.
Nelson et al. (2021) add to this by examining the possibility of fully automating
coding through unsupervised machine learning. For example, unsupervised machine
learning, such as topic models, can automatically identify potential topics that a
given data X addresses. By examining the extent to which such automatic assign-
ment of topics matches the categories assigned by humans, we can examine the
possibility of automatic coding via unsupervised machine learning. Conclusively, it
is difficult for unsupervised machine learning to reproduce a classification that
corresponds exactly to a human predefined concept. Therefore, it would be a mistake
to assume that automatic coding by unsupervised machine learning can completely
replace manual coding by humans. Rather, the potential of unsupervised machine
learning coding lies in its ability to discover new classifications. While machines
cannot perform coding as flexibly as humans can, they are free from the biases and
narrowness of vision inherent in humans and may discover latent patterns and
propose new classifications that humans were previously unaware of. Of course,
the usefulness of such classifications must be determined by humans from the
standpoint of sociological theory. Adding machine “interpretations” to human
interpretations in this way may enable concept formation that is also beneficial in
advancing sociological theory.
As we have seen in the previous section, there are various possibilities for the
application of machine learning to sociology. In this section, we continue to examine
how applications of machine learning can contribute to the development of socio-
logical theory, and in particular, we introduce two methods in relation to the goal of
sociology, which is to elucidate causal mechanisms.
Like other social sciences such as economics and political science, sociology has
paid great attention to the causal factors that cause social phenomena. Moreover, the
main theoretical goal of sociology is to elucidate the causal mechanisms of social
44 H. Takikawa and S. Fujihara
phenomena. When we speak of mechanisms, we mean not only the mere connec-
tions between causes and effects but also the elucidation of mechanisms at a deeper
level that link causes and effects (Hedström & Ylikoski, 2010; Machamer et al.,
2000; Morgan & Winship, 2014). Well known in sociology is the micro–macro
mechanism elucidation research program formulated by Coleman (1990). This
research program aims to elucidate the mechanisms for macro collective social
phenomena from a more micro, action level.
How can machine learning be used to elucidate such causal mechanisms? The
identification of causal effects itself requires the use of a statistical causal inference
framework, which is largely outside the current scope of machine learning. How-
ever, identifying the heterogeneity of causal effects is an important step toward a
better understanding of causal mechanisms (Salganik, 2018). Heterogeneity of
causal effects refers to the fact that causal effects vary by situation, context, and
attributes of the intervention target. Machine learning is extremely powerful in the
search for such heterogeneity.
When searching for effect heterogeneity using the traditional hypothesis testing
approach, one is confronted with a variety of problems. First, prior theoretical
preconceptions and conventions dictate at what level effect heterogeneity is likely
to exist. For example, gender, age, and socioeconomic status are preferred variables
in sociology. This in itself is not necessarily a bad thing, but there is a risk that the
scope of inquiry of sociological theory is narrowed beforehand (it is not limited to
gender, age, and socioeconomic status). In addition, there may be a widespread
practice of feeding various interaction terms into regression models and reporting the
results of models that incorporate only those interaction terms that actually become
significant, which amounts to clear p-hacking (Brand et al., 2021). Finally, hetero-
geneity is not always adequately captured by first-order interactions in regression
models. It is quite possible that it is caused by second-order or higher interactions or
even more complex nonlinear mechanisms (cf. Molina & Garip, 2019). However, it
is difficult to consider such possibilities with existing quantitative methods (Brand
et al., 2021).
Athey and Imbens (2016) developed a combination of machine learning and
causal inference called causal trees, which is a way to address these issues and is of
great use to sociology. Causal trees apply the method of decision trees in machine
learning to model the heterogeneity of causal effects in an exploratory manner and
without overfitting. Decision trees are a method of constructing a tree for data with
covariates and target variables by partitioning the covariates with the goal of
predicting a certain target variable (Hastie et al., 2009). The tree consists of multiple
nodes in a hierarchical structure. At the first node (sometimes called the root), data
are split into two subnodes based on a threshold value for a given covariate. Splitting
is performed in such a way that the values of the target variables are most similar
within each node. Then, at each node, a further division based on the covariate
threshold is made in a similar manner. This process is repeated to construct the final
tree. The decision tree construction is highly transparent because the algorithm is
relatively simple. It is also easy to interpret visually.
3 Methodological Contributions of Computational Social Science to Sociology 45
In a causal tree, the goal is to predict the treatment effect τ instead of the target
variable. As in the usual decision tree, the tree is constructed in such a way that the
heterogeneity within the nodes of τ is reduced. However, unlike the usual target
variable, the treatment effect τ is a potential outcome and not an observable quantity.
Specifically, Athey and Imbens (2016) developed a procedure called honest estima-
tion. In “honest” estimation, the sample is split into data for partitioning the covariate
space and data for estimating treatment effects within nodes. The partitioning of the
nodes is set up in such a way that the heterogeneity of the treatment effects is
captured as much as possible, while the uncertainty in the treatment effect estimation
is minimized. In general, the finer the split, the greater the heterogeneity between
nodes, while the estimation of the treatment effect within a node becomes more
uncertain. In other words, there is a tradeoff between the two goals, and the goal is to
make the partition in such a way that they are just balanced.
An example of an application of causal trees in sociology is the work of Brand
and colleagues (Brand et al., 2021). They use NLSY panel data to examine the extent
to which a college degree is effective in reducing the time spent in low-wage jobs.
Analysis with causal trees allows them to find not only the average causal effect of a
college degree in reducing time in a low-wage job but also heterogeneous causal
effects and the extent to which these effects vary across people with particular
attributes. Moreover, using the tree enables them to examine not only linear effects
but also complex interactions of various factors. The results of their analysis indicate
unexpected heterogeneity due to such complex interactions. The effect of a college
degree on the reduction of low-wage work was particularly large for those whose
mothers were less educated, grew up in large families, and had less social control.
Such unexpected findings provide an opportunity to further explore the causal
mechanisms and lead to further development of sociological theory.
Mediation analysis is another method for mechanism exploration (VanderWeel,
2015). Currently, the causal mediation analysis method is constructed from the
perspective of causal inference. The quantities of interest (estimand) include direct
intervening to set the mediator variable M to m as well as the treatment A to a, or the
direct and indirect effects of setting that the mediator variable M to a natural value
(Ma) after the treatment A is set to a. Conditions for the identification of these various
direct and indirect effects have been examined, and several methods of estimation
have been developed. Machine learning methods are useful in causal mediation
analysis, just as they are useful in causal inference. In addition, to understand the
effects of treatment variables that change over time, it is necessary to think carefully
about the estimand, identification, and estimation. Machine learning methods can
also be used for estimation in this context (Lendle et al., 2017; van der Laan and
Rose, 2018).
46 H. Takikawa and S. Fujihara
The greatest problem with machine learning’s emphasis on prediction is that the
theoretical interpretability of the results is limited. In other words, the internal
mechanisms through which machine learning models produce good predictions are
unknown. Nevertheless, the interpretability of a conventionally simple model does
not mean that a simple model should be chosen at the expense of predictability
(Yarkoni & Westfall, 2017). Rather, the fact that a model is predictable, even if it is a
complex model, can in principle be considered to mean that there exists the possi-
bility of theorizing in it and thus the possibility of making the model interpretable.
Therefore, a methodology is needed to build an interpretable social science theory
while preserving the predictive performance of machine learning models to the
fullest extent possible. This can be positioned as a methodology for integrated
modeling that aims to integrate predictive and explanatory capabilities (Hofman
et al., 2021).
For example, it is said that the coefficients of a linear regression model can be
easily interpreted by combining the results into a single quantity, in contrast to
machine learning models, which often lack a single interpretable quantity and can be
difficult to understand. However, the average partial effect, which is a measure of the
“effect” of a particular variable on the outcome of interest, can be obtained using
machine learning predictions and interpreted in a similar way to coefficients in
traditional regression analysis. If we want to know how much a partial change in
one variable x will change y on average, holding other variables constant, we can
compute them directly from the predictions. For example, to find the average partial
effect, we can take the differences between the predicted value of y for two different
values of x (x and x + Δ) and then divide that difference by Δ and average them
(Lundberg et al., 2021). If we clearly define the target quantity we wish to obtain, it
can be calculated from the predictions. If we prioritize ease of interpretation, we can
choose a quantity that is easily understood. Defining a clear and easily interpretable
target quantity can help to address many of the challenges associated with
interpreting the results of machine learning. When a target quantity is well defined
and meaningful from a theoretical perspective, it can be easier to understand and
draw meaningful conclusions from the prediction by machine learning.
In addition, the “scientific regret minimization” proposed by Agrawal et al. is
considered a promising methodology (Agrawal et al., 2020; Hofman et al., 2021).
This method seeks to improve social science models by preparing large-scale data
and using machine learning methods to focus only on variances that can be explained
in principle. Variances that can be explained in principle are those that could have
been explained by a better model. In contrast, the inability to explain inherent noise
is not a problem; rather, changing the model to accommodate the noise will lead to
overfitting and loss of generalization performance.
Specifically, the following steps should be taken:
1. Train a theoretically unconstrained machine learning model (black-box model) on
a large data set to identify explainable variances in the data set.
3 Methodological Contributions of Computational Social Science to Sociology 47
3.7 Conclusion
References
Abbott, A. (1988). Transcending general linear reality. Sociological Theory, 6(2), 169–186. https://
doi.org/10.2307/202114
Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining causal findings without bias: Detecting
and assessing direct effects. American Political Science Review, 110(03), 512–529. https://doi.
org/10.1017/S0003055416000216
Achen, C. H. (2005). Let’s put garbage-can regressions and garbage-can probits where they belong.
Conflict Management and Peace Science, 22(4), 327–339. https://doi.org/10.1080/
07388940500339167
Agrawal, M., Peterson, J. C., & Griffiths, T. L. (2020). Scaling up psychology via scientific regret
minimization. Proceedings of the National Academy of Sciences, 117(16), 8825–8835.
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete.
Wired magazine, 16(7), 16–07.
3 Methodological Contributions of Computational Social Science to Sociology 49
Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial
intelligence: An agenda (pp. 507–547). University of Chicago Press.
Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Pro-
ceedings of the National Academy of Sciences, 113, 7353–7360.
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan,
I. (2018). The moral machine experiment. Nature, 563(7729), 59–64.
Berk, R. A. (2004). Regression analysis: A constructive critique. Sage.
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning. Springer.
Blumenstock, J. E., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile
phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420
Brand, J. E., Xu, J., Koch, B., & Geraldo, P. (2021). Uncovering sociological effect heterogeneity
using tree-based machine learning. Sociological Methodology, 51(2), 189–223.
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the
author). Statistical Science, 16(3), 199–231.
Cinelli, C., Forney, A., & Pearl, J. (2021). A crash course in good and bad controls. SSRN
Electronic Journal. https://doi.org/10.2139/ssrn.3689437
Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the
ACM, 55(10), 78–87.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics,
26(4), 745–766.
Elwert, F., & Winship, C. (2010). Effect heterogeneity and bias in main-effects- only regression
models. In R. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality:
A tribute to Judea Pearl (pp. 327–336). Joseph Y. Halpern.
Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of
culture. Poetics, 68, 144–154.
Garip, F. (2020). What failure to predict life outcomes can teach us. Proceedings of the National
Academy of Sciences, 117(15), 8234–8235.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models.
Cambridge University press.
Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University
Press.
Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative
research. Aldine De Gruyter.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects
and the effects of heterogeneous treatments with ensemble methods. Political Analysis, 25(4),
413–434. https://doi.org/10.1017/pan.2017.15
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An
agnostic approach. Annual Review of Political Science, 24, 395–419.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine
learning and the social sciences. Princeton University Press.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical
learning: Data mining, inference, and prediction. Springer.
Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences. Annual Review of
Sociology, 36(1), 49–67. https://doi.org/10.1146/annurev.soc.012809.102632
Hernán, M. A., Hsu, J., & Healy, B. (2019). A second chance to get causal inference right: A
classification of data science tasks. Chance, 32(1), 42–49. https://doi.org/10.1080/09332480.
2019.1579578
50 H. Takikawa and S. Fujihara
Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H.,
Mullainathan, S., Salganik, M. J., Vazire, S., & Vespignani, A. (2021). Integrating explanation
and prediction in computational social science. Nature, 595(7866), 181–188.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning.
Springer.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects.
Science, 349(6245), 255–260.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47(2), 263–292.
Keele, L., Stevenson, R. T., & Elwert, F. (2020). The causal interpretation of estimated associations
in regression models. Political Science Research and Methods, 8(1), 1–13. https://doi.org/10.
1017/psrm.2019.31
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196–217.
Kino, S., Hsu, Y.-T., Shiba, K., Chien, Y.-S., Mita, C., Kawachi, I., & Daoud, A. (2021). A scoping
review on the use of machine learning in research on social determinants of health: Trends and
research prospects. SSM Population Health, 15, 100836. https://doi.org/10.1016/j.ssmph.2021.
100836
Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems.
American Economic Review, 105(5), 491–495.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage.
Lazer, D. (2015). Issues of construct validity and reliability in massive, passive data collections.
The City Papers: An Essay Collection from The Decent City Initiative.
Le Borgne, F., Chatton, A., Léger, M., Lenain, R., & Foucher, Y. (2021). G-computation and
machine learning for estimating the causal effects of binary exposure statuses on binary out-
comes. Scientific Reports, 11(1), 1435.
Lendle, S. D., Schwab, J., Petersen, M. L., & van der Laan, M. J. (2017). ltmle: An R package
implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statis-
tical Software, 81(1), 1–21. https://doi.org/10.18637/jss.v081.i01
Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What is your estimand? Defining the target
quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565.
Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about Mechanisms. Philosophy of
Science, 67(1), 1–25. https://doi.org/10.1086/392759
McFarland, D. A., Lewis, K., & Goldberg, A. (2016). Sociology in the era of big data: The ascent of
forensic social science. The American Sociologist, 47(1), 12–35.
Mizuno, M., & Takikawa, H. (2022). Computational social science on the structure of communi-
cation between consumers (Yoshida foundation report).
Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45,
27–45.
Mooney, S. J., Keil, A. P., & Westreich, D. J. (2021). Thirteen questions about using machine
learning in causal research (you won’t believe the answer to number 10!). American Journal of
Epidemiology, 190(8), 1476–1482. https://doi.org/10.1093/aje/kwab047
Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference: Methods and
principles for social research (2nd ed.). Cambridge University Press.
Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal
of Economic Perspectives, 31(2), 87–106.
Naimi, A. I., & Balzer, L. B. (2018). Stacked generalization: An introduction to super learning.
European Journal of Epidemiology, 33(5), 459–464. https://doi.org/10.1007/s10654-018-
0390-z
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of
hand-coding and three types of computer-assisted text analysis methods. Sociological Methods
& Research, 50(1), 202–237.
3 Methodological Contributions of Computational Social Science to Sociology 51
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-
scale experiments and machine learning to discover theories of human decision-making.
Science, 372(6547), 1209–1214.
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A.,
Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., & Datta, D. (2020). Measuring
the predictability of life outcomes with a scientific mass collaboration. Proceedings of the
National Academy of Sciences, 117(15), 8398–8403.
Schuler, M. S., & Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in
observational studies. American Journal of Epidemiology, 185(1), 65–73. https://doi.org/10.
1093/aje/kww165
Schutz, A., & Luckmann, T. (1973). The structures of the life world. Northwestern University Press.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed
flexibility in data collection and analysis allows presenting anything as significant. Psycholog-
ical Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632
Tavory, I., & Timmermans, S. (2014). Abductive analysis: Theorizing qualitative research. Uni-
versity of Chicago Press.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society: Series B (Methodological), 58, 267–288.
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of
the Royal Statistical Society: Series B (Statistical Methodology), 73, 273–282. https://doi.org/
10.1111/j.1467-9868.2011.00771.x
Van der Laan, M. J., & Rose, S. (2011). Targeted learning. Springer.
van der Laan, M. J., & Rose, S. (2018). Targeted learning in data science: Causal inference for
complex longitudinal studies. Springer International Publishing. https://doi.org/10.1007/978-3-
319-65304-4
VanderWeele, T. J. (2015). Explanation in causal inference: Methods for mediation and interac-
tion. Oxford University Press.
VanderWeele, T. J. (2019). Principles of confounder selection. European Journal of Epidemiology,
34(3), 211–219. https://doi.org/10.1007/s10654-019-00494-6
Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology,
120(2), 313–351.
Westreich, D., & Greenland, S. (2013). The table 2 fallacy: Presenting and interpreting confounder
and modifier coefficients. American Journal of Epidemiology, 177(4), 292–298. https://doi.org/
10.1093/aje/kws412
Westreich, D., Lessler, J., & Funk, M. J. (2010). Propensity score estimation: Neural networks,
support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic
regression. Journal of Clinical Epidemiology, 63(8), 826–833.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology. Perspec-
tives on Psychological Science, 12(6), 1100–1122.
Zhang, H., & Pan, J. (2019). Casm: A deep-learning approach for identifying collective action
events with text and image data from social media. Sociological Methodology, 49(1), 1–57.
Chapter 4
Computational Social Science: A Complex
Contagion
Michael W. Macy
M. W. Macy (✉)
Department of Sociology and Department of Information Science, Cornell University, Ithaca,
NY, USA
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 53
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_4
54 M. W. Macy
scientists who may lack the theoretical grounding necessary to know where to look,
what questions to ask, or what the results may imply” (Golder & Macy, 2014,
p. 144).
Although Scott and I referred to “the first wave,” the study of online interaction in
global networks should more appropriately be termed “the second wave” of com-
putational social science. The first wave began decades earlier and involved the use
of computational models to explore the logical implications of a set of theoretical
propositions. Most notably, the seminal work by Schelling (1971) and Axelrod
(1984) examined the dynamics of residential segregation and the evolutionary
origins of social order, questions that are central not only to Sociology but to Social
Psychology and Political Science as well. In this chapter, I recount my personal
involvement in computational social science over five decades, focusing on two
themes: the technical obstacles that had to be overcome, and the foundational
research questions that have motivated the field.
My initial foray into what came to be known as computational social science
dates back to my junior year in college and exemplifies the early fascination with
abstract computational models. My mentor, Karl Deutsch, was one of the first social
scientists to apply simulation, information theory, and system dynamics models to
the study of war and peace. One afternoon Prof. Deutsch walked in to our weekly
tutorial with a rumpled copy of The Prisoner’s Dilemma: A Study in Conflict and
Cooperation by Rapaport and Chammah (1965). He handed me the book and told
me to come back when I had finished reading it. The book focused on the dynamics
of cooperation in iterated play and introduced computer simulation of stochastic
learning models and human subject experiments to test model predictions. In the
words of Martin Shubik (1970, p. 193), the book “adopts a completely different
approach and starts to do precisely what I believe is necessary—to enlarge the
framework of the analysis to include learning and sociopsychological factors,
notably reciprocity.” Rapaport went on to win Robert Axelrod’s famous prisoner’s
dilemma tournament by submitting the simplest of all strategies entered: “tit for tat.”
Rapoport and Chammah’s experiments inspired me to see if I could use computer
simulation to dig deeper into their learning-theoretic approach. Suppose the players
are unaware of the game’s mathematical structure and instead apply a simple
stochastic propensity to repeat behaviors associated with a satisfactory outcome
and otherwise explore an alternative. Would the players learn to cooperate?
To find out, I asked Prof. Deutsch if I could simulate an iterated two-person PD
game, with each player using an identical stochastic strategy based on reinforcement
learning. The stochastic learning model was exceedingly simple. The players have a
single goal: to use their own behavior to induce the other player to cooperate. Each
player begins by flipping a coin to choose whether to cooperate or defect and then
4 Computational Social Science: A Complex Contagion 55
updates its cooperative propensity based on satisfaction with the associated behav-
ior. Players are satisfied when the other player cooperates and dissatisfied when they
defect.
The problem was that my school still relied on an IBM mainframe, programmed
using punch cards. I would have to punch the cards with lines of code, stand in line
waiting to submit the deck, wait hours to get back the paper output, learn that I had
made a mistake, punch a new card, and resubmit the deck. Worse yet, the prohibitive
cost discouraged the exercise of curiosity when funded by a small grant. After a few
frustrating nights, I gave up. I complained about the problem to a friend, Jed Harris,
who worked at Apt Associates, not far from campus. Jed told me that Apt had
installed the latest DEC PDP-7, a computer that would allow me to observe, debug,
and quickly modify the game dynamics as they played out in real time. Jed let me use
the machine at night, after work. I could accomplish in one evening what would have
taken me weeks on the mainframe, and it cost nothing to explore the parameter
space.
The simulations revealed a self-reinforcing cooperative equilibrium that could
quickly recover from small perturbations introduced by occasional defections.
However, the stochastic stability of mutual cooperation depended on the learning
rate. Highly reactive players would quickly learn to cooperate and could easily
recover if one of the players were to test the other’s resolve. In contrast, slow
learners were doomed to endless cycles of recrimination, retaliation, and mutual
defection. I wrote up the results for Prof. Deutsch and put the paper with his
encouraging comments in a cardboard box, where it remained for 20 years.
Fast forward two decades to when Peter Hedstrom called me up from the University
of Chicago where he was helping James Coleman launch a new journal, Rationality
and Society. Peter reminded me about my old prisoner’s dilemma simulation, which
I had mentioned to him back when we were in grad school together and asked me to
update the paper and submit it for the journal’s second issue. The paper (Macy,
1989) introduced what later came to be known as “agent-based modeling,” a new
approach to theoretical research that replaced “a model of a population” with “a
population of models,” where each model corresponds to an autonomous agent
interacting with its neighbors.
The point I want to underscore is that the roots of computational social science go
back to pioneers like Karl Deutsch, Anatol Rapaport, and Albert Chammah. How-
ever, the field had to wait 20 years for a new generation of personal computers to
catch up with the advances in social theory that first inspired me as an undergraduate.
Recognizing the opportunities opened up by universal access to desktop com-
puting, Bill Bainbridge organized a national meeting supported by the National
Science Foundation on “Grand Computing Challenges for Sociology” (Bainbridge,
56 M. W. Macy
1994). John Skvoretz and I were both in attendance and were inspired by
Bainbridge’s call. Our model of the diffusion of trust and cooperation among
strangers (Macy & Skvoretz, 1998) was followed up by a PNAS paper with Mitch
Sato on trust and market formation in the USA and Japan (Macy & Sato, 2002).
I also collaborated with Andreas Flache to leverage the rapidly increasing power
of desktop machines to explore the theoretical implications of “backward-looking”
alternatives to the forward-looking rationality assumed in classical game theory.
This culminated in a paper in the Annual Review of Sociology, “Beyond Rationality
in Theories of Choice” (Macy & Flache, 1995). However, the impact of agent-based
modeling extended far beyond our learning-theoretic approach. Robb Willer and I
later co-authored a paper in the Annual Review of Sociology (Macy & Willer, 2002)
that highlighted a fundamental analytical shift made possible by agent-based model-
ing, from “factors” (interactions among variables) to “actors” (interactions among
agents embedded in social networks).
early adopters but less so for those who wait. The same holds for participation in
collective action. Studies of strikes (Klandermans, 1988), revolutions (Gould,
1996), and protests (Marwell & Oliver, 1993) emphasize the positive externalities
of each participant’s contribution. The costs and benefits for investing in public
goods often depend on the number of prior contributors—the “critical mass” that
makes additional efforts worthwhile.
2. Credibility: Innovations often lack credibility until adopted by neighbors. For
example, Coleman et al. (1966) found that doctors were reluctant to adopt
medical innovations until they saw their colleagues using it. Markus (1987)
found the same pattern for adoption of media technology. Similarly, the spread
of urban legends (Heath et al., 2001) and folk knowledge (Granovetter, 1978)
generally depends upon multiple confirmations of the story before there is
sufficient credibility to report it to others. Hearing the same story from different
people makes it seem less likely that surprising information is nothing more than
the fanciful invention of the informant. The need for confirmation becomes even
more pronounced when the story is learned from a socially distant contact, with
whom a tie is likely to be relationally weak.
3. Legitimacy: Knowing that a movement exists or that a collective action will take
place is rarely sufficient to induce bystanders to join in. Having several close
friends participate in an event often greatly increases an individual’s likelihood of
also joining (Finkel et al., 1989; Opp & Gern, 1993), especially for high-risk
social movements (McAdam & Paulsen, 1993). Decisions about what clothing to
wear, what hair style to adopt, or what body part to modify are also highly
dependent on legitimation (Grindereng, 1967). Non-adopters are likely to chal-
lenge the legitimacy of the innovation and innovators risk being shunned as
deviants, until there is a critical mass of early adopters (Crane, 1999; Watts,
2002).
4. Emotional Contagion: Most theoretical models of collective behavior—from
action theory (Smelser, 1963) to threshold models (Granovetter, 1973) to cyber-
netics (McPhail, 1991)—share the basic assumption that there are expressive and
symbolic impulses in human behavior that can be communicated and amplified in
spatially and socially concentrated gatherings (Collins, 1993). The dynamics of
cumulative interaction in emotional contagions has been demonstrated in events
ranging from acts of cruelty (Collins, 1974) to the formation of philosophical
circles (Collins, 1993).
The theory of complex contagion can be understood as an extension of
Granovetter’s theory of the strength of weak ties (Granovetter, 1973). According
to Granovetter, ties are weak relationally when there is infrequent interaction and/or
low emotional salience, as in relations with an acquaintance. However, these ties can
nevertheless be structurally strong in that they provide access to information from
outside one’s immediate circle. In network terminology, relationally weak ties often
have long range, meaning that they connect nodes located in structurally distant
network neighborhoods. Damon and I showed that the structural strength of long-
range ties is limited to simple contagions that do not require social reinforcement. In
58 M. W. Macy
The Hopster, it turned out, had much more to teach us about the self-reinforcing
dynamics of homophily and social influence. Daniel DellaPosta, Yongren Shi, and I
used a descendant of the original model to address the curious tendency for liberals
and conservatives to differ not only on policy but also lifestyle preferences, as
documented using the General Social Survey (DellaPosta et al., 2015). However,
our underlying theoretical motivation ran deeper: to show how belief systems can
self-organize through the forces of attraction and repulsion. The emergent configu-
rations invite post-hoc explanations of cultural fault lines that can be as substantively
idiosyncratic as a liberal preference for caffeinated hot beverages, an idea originally
proposed by Miller McPherson (2004).
A few years later, Sebastian Deri, Alex Ruch, Natalie Tong, and I (2019) tested
the “latte liberal” hypothesis using a large online experiment, modeled after the
multiple worlds “music lab” devised by Salganik et al. (2006). The results of our
“party lab” experiment confirmed the predicted unpredictability of partisan divi-
sions. The emergent disagreements were as deep as those observed in contemporary
surveys, but with one important difference: You could be sure that the two parties
would strongly disagree, but it was a coin flip as to who would be on which side. In
one “world,” Democrats might join the bandwagon to embrace “great books,” while
Republicans rallied around more emphasis on children’s physical fitness, but in the
next world the sides would be switched. The problem is that social scientists, like the
participants in our study, can only observe the one world we all inhabit. That leaves
us susceptible to “just so” stories that plausibly explain the opposing beliefs of each
4 Computational Social Science: A Complex Contagion 59
side, unaware that the sides could just as easily have been switched but for the luck
of the draw.
The arbitrariness of partisan division invites the reassuring hypothesis that polit-
ical polarization can be easily reversed simply by reminding everyone that the
emperor is naked. Unfortunately, it is not so easy, for two reasons—false enforce-
ment and hysteresis. In a study with Robb Willer and Damon Centola (2005), we
simulated Andersen’s classic fable to show how conformists might falsely enforce
unpopular norms to avoid suspicion that they might have complied because of social
pressure instead of genuine conviction. In a follow-up study with Ko Kuwabara,
Robb and I tested the “false enforcement” hypothesis in a wine-tasting experiment
using vinegar-tainted wine (Willer et al., 2009). As predicted by Andersen (as well
as Arthur Miller’s The Crucible), participants who praised the tainted wine were
more likely to criticize the lone confederate who refused to go along with the
charade—but only when the criticism was performed in public.
Polarization may be hard to reverse even in the absence of false enforcement. In
collaboration with a team of computer scientists at RPI, I recently used another
Hopster variant to investigate the tipping point beyond which a polarized legislature
becomes increasingly unable to unite against common threats, such as election
interference by a foreign adversary or a global pandemic (Macy et al., 2021). The
problem is hysteresis, in which polarization alters the network structure by elimi-
nating the inter-party ties by which pragmatists might “reach across the aisle.” The
structural change is difficult if not impossible to reverse, even if the political
temperature could somehow be lowered well below current levels. The disturbing
implications attracted widespread media attention, including the New York Times
and CNN.
If the “first wave” in computational social science was all theory and little data, the
“second wave” was the mirror opposite: big data with little theory. Social science has
accumulated a trove of theories waiting for the data that are needed to test them. The
transformative potential of online data was celebrated by one of the founders of
computational social science, Duncan Watts (2011, p. 266):
[J]ust as the invention of the telescope revolutionized the study of the heavens, so too by
rendering the unmeasurable measurable, the technological revolution in mobile, Web, and
Internet communications has the potential to revolutionize our understanding of ourselves
and how we interact. . . . [T]hree hundred years after Alexander Pope argued that the proper
study of mankind should lie not in the heavens but in ourselves, we have finally found our
telescope. Let the revolution begin.
Is the Web the space telescope of the social sciences? The metaphor is instructive.
The power of a telescope, whether in outer space or cyberspace, depends on our
ability to know where to point it. Moreover, with millions of observations in global
60 M. W. Macy
networks, the challenge is to find differences that are not statistically significant.
Theoretical significance then becomes paramount.
My first study using big data addressed Ron Burt’s theory of structural holes. Nathen
Eagle, Rob Claxton, and I used call logs from one of the UK’s largest telecoms to test
Ron’s theory at population scale (Eagle et al., 2010). As predicted, we found that
economically advantaged communities tended to have more people with ties that
link otherwise distantly connected neighbors, although the causal direction remained
to be sorted out.
More recently, Patrick Park, Josh Blumenstock, and I (2018) used these same
telecom data, along with global network data from Twitter, to search the social
heavens for “network wormholes,” our term for long-range ties that span vast
distances in a global communications network. These were not the “wide bridges”
in complex contagion; in contrast, we were searching for Granovetter’s “weak ties,”
the “long bridges” that connect otherwise unreachable clusters. Not surprisingly, we
found these ties to be extremely rare. For any random edge in a global network, the
“degree of separation” along the second-shortest path is almost always close to two
hops. Nevertheless, a handful of long-distance “shortcuts” can also be found in the
global communication networks made visible by social media. The question then
arises, are these shortcuts strong enough to matter? The default assumption in
network science, going back to Granovetter, is that they are relationally weak. The
“strength of weak ties,” in Granovetter’s theory, is the access they provide to socially
distant sources of information, not their affective or social intensity. Long-range ties,
or so the theory goes, connect acquaintances with whom interaction is infrequent,
influence is low, and bandwidth is narrow. But that is not what we found. Contrary to
extant theory, network wormholes have nearly the same bandwidth and affective
content as the densely clustered ties that connect a small circle of friends.
Another big empirical study was motivated by years of computational modeling
with the Hopster. Yongren Shi, Feng Shi, Fedor Dokshin, James Evans, and I used
millions of Amazon book co-purchases to see if partisan cultural fault lines extended
even to the consumption of science, a realm that is presumably above the political
fray (2017). Could a shared interest in science bridge political differences and
encourage reliance on science to inform political debate? Or has science become a
new battlefield in the culture wars? We found that the political left and right share an
interest in science in general, but not science in particular. Liberals are drawn more
to basic science (e.g., physics, astronomy, and zoology), while conservatives prefer
applied science (e.g., criminology, medicine, and geophysics). Liberals read science
books that are more often purchased by people who do not buy political books, while
conservatives prefer science books that are mainly purchased by fellow
conservatives.
4 Computational Social Science: A Complex Contagion 61
The most impactful paper of my career was a 2011 study with Scott Golder in
which we used millions of messages obtained from Twitter to measure diurnal
emotional rhythms (Golder & Macy, 2011). We sorted the tweets into 168 buckets,
one for each hour of the week, and then used Pennebaker’s Linguistic Inquiry and
Word Count lexicon to measure the level of positive and negative emotion in each
bucket. Positive emotion is indicated by the use of words like “excited” or “fun,” in
contrast to words like “disappointed” or “frustrated.” We found that people were
happiest right around breakfast, but for the rest of the workday it was all downhill.
We immediately suspected the effects of an exhausting job ruled by an unpleasant
supervisor, but then we noticed the same pattern on weekends as well, except that the
starting point was delayed by about 2 h, reflecting perhaps the opportunity to sleep
in. Using changes in diurnal rhythms relative to sunrise and sunset, we concluded
that the pattern is driven largely by sleep cycles, not work cycles. The paper was
published in Science and attracted more mainstream media attention than all my
other papers combined.
A decade later, Minsu Park and I (with the help of company staffers) used global
Spotify music logs to track the flip side of the Twitter study—the emotions people
are exposed to in the music they choose to stream instead of the emotions they
express (Park et al., 2019). We filtered out Spotify playlists to focus on user-selected
music the world over. We discovered that the diurnal pattern in affective preference
closely resembles the diurnal cycles that Scott and I detected in expressed emotion.
This suggests the possibility that our affective preferences reinforce rather than
compensate our emotional state. For example, when we are sad we do not go for
upbeat music to lift our spirits, we listen to something melancholy. Unfortunately,
Minsu and I were unable to link user’s Spotify and Twitter accounts, so our affective
reinforcement theory remains to be tested at the individual level.
4.7 Conclusion
This brief overview of my personal involvement in the first and second waves of
computational social science is intended to call attention to the foundational ques-
tions that the studies addressed, from the origins of social order to the strength of
weak ties in global online networks. There is no shortage of theory in the social
sciences, and from its inception, computational social science has ranked among the
more theory-driven fields. In contrast, there has been a shortage of data with which to
test many of those theories, due largely to the historic difficulty in observing social
interaction except in small groups. That is now changing with the global popularity
of online activities that leave digital traces, from shopping to blogging. Nevertheless,
the social sciences have not taken full advantage of the vast new research opportu-
nities opened up by advanced computational methods. The question is why?
I do not believe this hesitancy should be attributed to the failure to ask interesting
and important questions. On the contrary, the signature contribution of computa-
tional social science is the opportunity to tackle vital questions that would otherwise
62 M. W. Macy
be inaccessible. I have not run the numbers but my casual impression is that these
studies are far more likely to appear in general science journals with double-digit
impact scores than in highly specialized journals devoted to topics that interest only a
narrow audience.
The problem is not the disciplinary relevance of the research; I suspect it is
instead the price of admission. Rapid advances in computation have been accompa-
nied by equally rapid turnover in the requisite technical skills, from object oriented
programming that super-charged agent-based modeling to deep learning and word
embedding that have opened up new frontiers in text analysis. These methods
require substantial retooling, even for quantitative specialists with advanced statis-
tical training.
The updating of graduate training that Scott Golder and I called for in our 2014
Annual Review paper remains to be implemented at scale in any of the social
sciences. Until that happens, computational social science is likely to remain con-
fined largely to those who have the necessary skills. The torch will then continue to
be carried mostly by computer scientists and socio-physicists who may be more
interested in discovering unexpected patterns in the data than discovering what they
might mean. Meanwhile, our best option is the increasing reliance on interdisciplin-
ary research teams that bring together specialists who not only know how to operate
the telescope but also where to point it.
References
Golder, S., & Macy, M. (2014). Digital footprints: Opportunities and challenges for online social
research. Annual Review of Sociology, 40, 129–152. https://doi.org/10.1146/annurev-soc-
071913-043145
Gould, R. (1996). Patron-client ties, state centralization, and the whiskey rebellion. American
Journal of Sociology, 102, 400–429. https://doi.org/10.1086/230951
Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380.
Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology,
83, 1420–1443.
Grindereng, M. (1967). Fashion diffusion. Journal of Home Economics, 59, 171–174.
Heath, C., Bell, C., & Sternberg, E. (2001). Emotional selection in memes: The case of urban
legends. Journal of Personality and Social Psychology, 81, 1028–1041. https://doi.org/10.1037/
0022-3514.81.6.1028
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational
abilities. National Academy of Sciences of the United States of America, 79, 2554–2558. https://
doi.org/10.1073/pnas.79.8.2554
Klandermans, B. (1988). Union action and the free-rider dilemma. Research in Social Movements,
Conflict and Change, 10, 77–92.
Macy, M. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s
dilemma. Rationality and Society, 1, 197–219.
Macy, M., & Flache, A. (1995). Beyond rationality in theories of choice. Annual Review of
Sociology, 21, 73–91.
Macy, M., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Sciences, 99, 7214–7220.
Macy, M., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A
computational model. American Sociological Review, 63, 638–660.
Macy, M., & Willer, R. (2002). From factors to actors: Computational sociology and agent-based
modeling. Annual Review of Sociology, 28, 143–166. https://doi.org/10.1146/annurev.soc.28.
110601.141117
Macy, M., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of
partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754
Macy, M., Ma, M., Tabin, D., Gao, J., & Szymanski, B. (2021). Polarization and tipping points.
Proceedings of the National Academy of Sciences, 118, e2102144118. https://doi.org/10.1073/
pnas.2102144118
Markus, M. (1987). Toward a ‘critical mass’ theory of interactive media: Universal access,
interdependence and diffusion. Communication Research, 14, 491–511. https://doi.org/10.
1177/009365087014005003
Marwell, G., & Oliver, P. (1993). The critical mass in collective action. Cambridge University
Press. https://doi.org/10.1017/CBO9780511663765
McAdam, D., & Paulsen, R. (1993). Specifying the relationship between social ties and activism.
American Journal of Sociology, 99, 640–667. https://doi.org/10.1086/230319
McPhail, C. (1991). The myth of the madding crowd. Aldine.
McPherson, M. (2004). A Blau space primer: Prolegomenon to an ecology of affiliation. Industrial
and Corporate Change, 13, 263–280.
Opp, K., & Gern, C. (1993). Dissident groups, personal networks, and spontaneous cooperation:
The East German Revolution of 1989. American Sociological Review, 58, 659–680. https://doi.
org/10.2307/2096280
Park, P., Blumenstock, J., & Macy, M. (2018). The strength of long-range ties in population-scale
social networks. Science, 362, 1410–1413.
Park, M., Thom, J., Mennicken, S., Cramer, H., & Macy, M. (2019). Global music streaming data
reveal diurnal and seasonal patterns of affective preference. Nature Human Behaviour, 3,
230–236. https://doi.org/10.1038/s41562-018-0508-z
Rapaport, A., & Chammah, A. (1965). Prisoner’s dilemma: A study in conflict and cooperation.
The University of Michigan Press.
64 M. W. Macy
Salganik, M., Dodds, P., & Watts, D. (2006). Experimental study of inequality and unpredictability
in an artificial cultural market. Science, 311, 854–856.
Schelling, T. (1971). Dynamic models of segregation. The Journal of Mathematical Sociology, 1,
143–186. https://doi.org/10.1080/0022250x
Shi, F., Shi, Y., Dokshin, F., Evans, J., & Macy, M. (2017). Millions of online book co-purchases
reveal partisan differences in the consumption of science. Nature Human Behaviour, 1, 79.
https://doi.org/10.1038/s41562-017-0079
Shubik, M. (1970). Game theory, behavior, and the paradox of the prisoner’s dilemma: Three
solutions. Journal of Conflict Resolution, 14, 181–193.
Smelser, N. (1963). Theory of collective behavior. Free Press.
Watts, D. (2002). A simple model of global cascades on random networks. Proceedings of the
National Academy of Sciences, 99, 5766–5771.
Watts, D. (2011). Everything is obvious: Once you know the answer. Crown Business.
Watts, D., & Strogatz, S. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Willer, R., Kuwabara, K., & Macy, M. (2009). The false enforcement of unpopular norms.
American Journal of Sociology, 115, 451–490. https://doi.org/10.1086/599250
Chapter 5
Model of Meaning
5.1 Introduction
Meaning is a fundamental component of the social world (Luhmann, 1995; Schutz &
Luckmann, 1973; Weber, 1946). People inhabiting the social world interpret the
meaning of natural objects in their environment and social objects, including others,
and act based on these interpretations. If we call the mechanism by which people
interpret the meanings of objects and other people’s actions and link them to their
own actions the meaning-making mechanism (Lamont, 2000), then social science,
which aims to explain the behavior of people and groups, must, as its fundamental
task, elucidate this meaning-making mechanism as its fundamental task.
In sociology, since Weber (1946) focused on subjective meaning in his definition
of the discipline, considerations related to meaning-making mechanisms—consid-
erations about the relationship between meaning and human action—have accumu-
lated. In terms of methods for elucidating meaning-making mechanisms, meaning
and culture have traditionally been considered qualitative in nature, as in German
sociologist GeistesWissenschaft’s position (Dilthey, 1910), which is closely related
to the establishment of Weber’s sociology. Therefore, although there are exceptions,
approaches to meaning have primarily been attempted through qualitative social
theory and qualitative research.
On the contrary, rational choice theory, the most influential formal theory of
action in sociology, has initially viewed meaning-making mechanisms in an
H. Takikawa (✉)
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
A. Ueshima
Graduate School of Arts and Letters, Tohoku University, Sendai, Japan
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 65
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_5
66 H. Takikawa and A. Ueshima
In addition, Weber proposed an ideal type of device for such an explanation. Ideal
types should be constructed, in Weber’s words, “meaning-adequate” and should be
conducive to causal explanation, which is in line with the argument in this chapter
that we should attempt to explain actions and collective consequences by formaliz-
ing actors’ meaning-making mechanisms.
Starting from the critique of Weberian interpretative sociology, Schutz made several
important insights into the meaning structure of the social world, which greatly
influenced today’s schools of social constructionism (Berger & Luckmann, 1967)
and ethnomethodology (Garfinkel, 1967). Schutz’s (Schutz & Luckmann, 1973)
analysis focused on the structure of the everyday life-world, which is taken for
granted by people in terms of their natural attitudes. The typification theory is
particularly important. According to Schutz, actors in the world are both constrained
by it and act and intervene in it, and pragmatic motives underlie their meaning-
making. People usually carry out their actions without delay based on typified
knowledge about things and people. In this way, the knowledge of the everyday
world is self-evident. However, when actions based on existing knowledge are
confronted with problematic situations, the knowledge and typologies previously
considered self-evident are questioned and reinterpreted. Under such dynamism, the
meaning of the social world is constituted.
Schutz’s typification theory is important in that it explicitly argues that our social
world is semantically constituted through knowledge and typologies, and that they
are not only given but are also socially constituted in that they are constantly being
questioned and revised. Schutz’s argument also excels in the coherence of its logical
organization. Although he himself does not propose a formal model, he recognizes
the importance of using models in the social sciences, and his arguments have a
logical structure that is relatively comfortable with formal models. As such, his
typification theory has a remarkable commonality in logical structure with the
formally formulated Bayesian theory of categorization (Anderson, 1991), which
we discuss later.
psychological structure of the actors who perceive and categorize the social world,
which is produced in the course of struggling to adopt their social positions
(Bourdieu, 1989).
The greatest legacy of Bourdieu’s argument is twofold: First, Bourdieu not only
proposed a conceptual apparatus for treating cognition sociologically but also
articulated a methodology for analysis using multiple correspondence analysis,
which allows for the spatial arrangement of different variables and the examination
of their positional relationships. Second, he argues that this method enables us to
spatially represent the social world as a social space, as well as the world of symbols
and meanings as a semantic space, and to discuss the correspondence between
the two.
Despite its limitations, Bourdieu’s method has had a direct impact on today’s
quantitative cultural sociology in that it paved the way for a quantitative approach to
meaning and, more specifically, suggested the possibility of meaning being
expressed through spatial structures.
Thus far, we have introduced Weber, Schutz, and Bourdieu’s classical theories of
meaning. While these can serve as important inspirations for developing models of
meaning, with the exception of Bourdieu, these sociologists themselves do not
directly propose formal models. In contrast, White and DiMaggio, whose work is
discussed next, developed ideas that directly lead to formal models of meaning and
quantification of meaning. Their arguments can be credited by providing a basis for
today’s computational sociology of culture.
Early in his career, H. White (1963) paved the way for a quantitative treatment of
culture by proposing an extended formal model of kinship structure by Levi-Strauss
and Weil. His next step was to formalize the most important sociological concept of
roles by means of an algebraic theory of networks, in which White invented the
concept of “catnet” and drew attention to the fact that the semantic category “cat,”
which refers to people or groups, is inseparably linked to the way “net” connects
these people (White, 2008b; White et al., 1976). Later, White attempted to theorize
people’s meaning-making and network formation mechanisms based on Bayesian
ideas by which identity is committed to control in an uncertain world, but he failed to
establish an explicit formal model (White, 1992, 1995, 2008a). Nevertheless, he has
had a profound impact on many of the central researchers in cultural sociology
today, such as Mohr, Bearman, and DiMaggio, who will be discussed shortly.
Another important figure of the quantitative approach in cultural sociology today
is DiMaggio, who published a programmatic review article in 1997 on the integra-
tion of cultural sociology and cognitive science (DiMaggio, 1997). He criticized
traditional sociological theories for failing to deal explicitly with cognition while
implicitly making some assumptions about cognition and called for the active
introduction of theories from cognitive and cognitive psychology to explore the
70 H. Takikawa and A. Ueshima
In the classic discussions in sociology by Weber, Schutz, and others, the approach to
the problem of meaning was almost entirely qualitative. However, thanks to the
efforts of White, DiMaggio, and others, today’s cultural sociology is beginning to
quantitatively approach meaning. Of particular importance are:
1. The expansion of data sources, including large-scale textual data, has led to the
development of computational social scientific methods.
2. In conjunction with (1), ideas from cognitive psychology and cognitive science
have permeated sociology.
These two opportunities have led to the establishment of a healthy practice in
cultural sociology by modifying theories based on empirical data while using formal
theoretical models as a foundation (Mohr, 1998; Mohr et al., 2020).
In this chapter, we will continue to follow this trend and examine the possibilities
of a sociological model of meaning-making. For this purpose, we will examine
models that have been developed mainly in computational linguistics and discuss the
possibility of connecting them to a sociological model of meaning. In the next
section, we would like to make some preliminary considerations that will serve as
preparation for the examination of models in computational linguistics.
more clearly formulate not only the function of meaning but also the forms of
meaning (relations) and the learning process of meaning. In the following, we will
explain a) a function of meaning in terms of prediction, and then discuss b) forms of
meaning (relationships), and c) learning processes of meaning (semantic learning).
Our desire to know the meaning of things lies in pragmatic motivation (Schutz &
Luckmann, 1973). In other words, we must know the meaning of things and events
in the world if we are to operate smoothly in this world. Knowing the meaning of a
thing or event is connected to predicting the things and events related to that thing or
event, which ultimately leads to actions based on predictions. For example, if we do
not know the meaning of a traffic sign, we may cause a serious traffic accident.
Suppose that we do not understand the meaning of a one-way traffic sign and enter a
street from the opposite direction. In that case, if our oncoming car is traveling
without slowing down, we might cause a serious accident. Understanding the
meaning of a sign implies being able to predict the occurrence of events associated
with that sign. When there is a one-way sign, we can predict that the car may drive in
the direction indicated by the arrow. With such predictions, we can avoid serious
accidents, drive smoothly, and arrive at our destination. Understanding the meaning
of things makes the world predictable and enables us to live smoothly.
What Schutz (Schutz & Luckmann, 1973) calls typification is the central activity
of meaning-making, which also aims at prediction. That is, by understanding objects
and objects in the world as specific categories, we can predict various aspects of the
attributes of these objects.
Typification is also called categorization and cognitive psychology (Anderson,
1991). According to Anderson (1991), categories are bundles of attributes. Thus, for
a thing with attribute I, if we estimate the category membership K from that attribute,
we can now predict another attribute, j, that the category has from the category
membership K. In the example used in Schutz’s typology, the appearance of a
mushroom (attribute i) and predicting that it has the attribute “edible” (attribute j)
is an example of the connection between categorization and prediction (Schutz &
Luckmann, 1973).
Meaning tied to prediction also implies that meaning relates a thing (observed
data) to another thing (other observed data) through prediction. From a computa-
tional perspective, understanding meaning involves inferring its potential meaning
from certain observed data, and the function of inferring meaning involves contrib-
uting to prediction. For example, if we observe a “rain cloud” and understand it to
mean “it will rain,” we can predict the event “the laundry will get wet.” From another
angle, the observed data of “rain clouds” can be seen as being associated with the
observed data of “laundry getting wet.” The meaning of “it rains” then “manifests”
as a complex of relationships among “rain clouds appear,” “laundry gets wet,”
“humidity rises,” and so on. Thus, meaning appears in the relationships between
multiple things. Using Anderson’s notation, the potential meaning K appears as a
complex of relationships among data i, j, . . . . If we link this to learning in anticipa-
tion of the next topic, it means that we learn the meaning K of observed data i by
relating it to other data j, l. . . . Let us discuss this next.
We stated that we estimate the potential meaning of things and objects to predict
what will happen next. We also mentioned that through such predictions, we can
connect the relationships among the attributes i, j, . . . of things and objects. This
prediction can be either correct or incorrect. If we see the color r on a mushroom and
think it is an edible mushroom K and eat it, the mushroom may not be edible j, but
may be poisonous l and give us a stomachache (Schutz & Luckmann, 1973). In this
case, we would conclude that the prediction of j based on the (mis)inferred category
K was incorrect, and we would reestimate the category membership inferred from
the mushroom’s color r as poisonous mushroom M rather than edible mushroom K.
Additionally, r would thereafter be remembered as being associated with poisonous
l, not edible j. Thus, prior estimates are modified and revised a posteriori when the
predictions are not accurate.
Here, we simply assumed that if the prediction matched the estimation result, the
estimation result would be retained, and if it missed, it would be replaced by another
estimation. In other words, the category “edible mushroom K” was estimated from
the color r of the mushrooms, and if the prediction derived from it (“mushrooms are
edible j”) was correct, the estimate K was retained, and if it was wrong (“mushrooms
are poisonous l”), K was discarded and M was assumed. Here, the learning process is
the choice between retaining or discarding the estimated result. In reality, however,
this learning process is stochastic because learning is performed to extract certain
information that is meaningful out of an uncertain and noisy observed environment.
In other words, the more accurate the prediction, the higher the probability that the
prior guess is correct and the lower the probability that it is wrong. This idea can be
formulated using Bayes principle:
Bayesian learning, which will be introduced again in the section on models of
computational linguistics, proceeds as follows: let Pr(K ) be the prior belief about
74 H. Takikawa and A. Ueshima
K (e.g., “a mushroom belongs to category K”), and let Pr( j| K ) be the probability of
an event j occurring when K occurs (“a mushroom belonging to K has attribute j”). If
the event j that occurs is exactly what K predicts (if Pr( j| K ) is large), the posterior
belief Pr(K| j) after observing j is strengthened, and if it differs from the prediction
(if Pr( j| K ) is small), the posterior belief is weakened. Thus, the meaning-making
process is characterized by Bayesian learning through trial and error.
By formulating meaning-making as a computational problem, we have discussed
the following: (a) the function of meaning is linked to prediction, (b) meaning
appears as a relational form, and (c) meaning is Bayesian, learned by trial and
error. With these considerations in mind, we will now examine how various com-
putational linguistics models can be used in theories of meaning.
hypothesis was originally derived from the social sciences. According to this
hypothesis, the meaning of a word can be inferred from the words around it (“You
shall know a word by the company it keeps!”). This hypothesis can be embodied by
the idea of meaning as a latent structure.
Now, let w be a set of words separated by a certain range, and let g be the latent
structure that generates the individual words w1, w2. . .that belong to w (cf. Griffiths
et al., 2007). Typically, w is a single sentence, and the individual words that make up
that sentence can be thought of as w1, w2. . . . In this case, as the distributional
hypothesis states, the meaning of a word is determined by its surrounding words. For
example, to know the meaning of word w1, we must estimate the latent structure g.
To estimate g, we must take w2, w3. . ., which are located around w1, as clues. Thus,
from the observed words alone, the meaning of word w1 can be determined by words
w2, w3. . . .
As such, the generative model can also be used for learning meaning. The
meaning of a word is learned from the meanings of the surrounding words. The
occurrence of the next word is predicted from the distribution of surrounding words
(via the estimation of the latent structure), and the accuracy of the prediction is
increased by modifying the prior guesses a posteriori according to the success or
failure of the prediction.
As previously mentioned, such a learning process can be formulated within the
framework of Bayesian learning. What we want to know is the probability Pr(g| w1)
that a word w1 is produced by latent structure g when we observe it. Using the
probability formula and transforming the equation, we obtain:
Prðw1 jgÞPrðgÞ
Prðgjw1 Þ = :
Prðw1 Þ
The right-hand side consists of three quantities, each of which has substantial
significance. These three quantities must be available to compute the left-hand
side. First, we need a subjective belief (prior belief), Pr(g), before observing w1.
Additionally, we must know the probability Pr(w1 | g) that g generates w1. This is
called likelihood in Bayesian learning. Finally, we need Pr(w1). This is the proba-
bility that w1 will occur, which is called the evidence.
Fig. 5.1 The structure of a topic model. Note: Topic as a probability distribution over words such
as z1, z2, . . .zm generates concrete words w1, w2, . . . A topic is assigned to each word according to
multinomial distribution (topic proportion) unique to each document
Nelson, 2020; Takikawa, 2019). Topic models are analytical tools that allow
researchers to discover latent topics in coherent texts and examine the words that
represent these latent topics. However, for the results of a topic model to have
external validity, it must be possible to say that real-world actors are actually
extracting similar topics from the text, at least implicitly, and producing meaning.
Therefore, in this chapter, we focus on the persuasiveness of the topic model as a
semantic model from a computational perspective.
The topic model itself has many variations, but the most basic model is the latent
Dirichlet allocation method (Blei et al., 2003). Almost all topic models, including
this latent Dirichlet allocation method, are hierarchical Bayesian. Hierarchical
Bayesian models represent the generation of meaning structurally. This is one of
the most prominent features of the topic models.
In the hierarchical Bayesian model, text is assumed to be generated by two
different probability distributions: a multinomial distribution that assigns a topic to
the slot in which the word is generated, and a multinomial distribution that proba-
bilistically generates a specific word in that slot, conditional on an assigned topic.
The former is called topic proportion, and the latter is called topic. We assume that
the observed word wi is generated by the latent structure g, expressed by these two
distributions. Let us examine this in more detail (Fig. 5.1).
First, we assume that each word w1, w2, . . ., wn that appears in document
d belongs to some topics z1, z2, . . ., zm. We assume that the word we observe is
generated probabilistically according to the topic to which it belongs. For example,
suppose we assume that a word like “gene” belongs to the topic “genetics” and a
word like “brain” belongs to the topic “neuroscience.” In the topic model, we model
the topic “genetics” as a probability distribution (multinomial distribution) over
words that generates words like “gene” and “DNA” with high probability (Blei,
2012). That is, each word is determined by the conditional probability distribution Pr
(w| zj).
How are the various topics assigned to the individual word slots in the first place?
This topic assignment follows a probability distribution called topic proportion,
which is unique to document d. To summarize, a probability distribution called
topic proportion is first assigned to each document. Next, a topic is assigned to a slot
5 Model of Meaning 77
methods. For technical details, refer to Stayvers and Griffiths et al. (2007) and Blei
et al. (2003).
To what extent does the topic model capture human understanding and meaning-
making practice? Of course, it is unlikely that human language use undergoes a
generative process assumed by the topic model. However, the inverse process
framed in terms of Bayesian inference can be regarded as modeling, to some extent,
our understanding of the meaning of texts and the process of meaning-making. We
infer the meaning of a word in light of its surroundings and thus predict the next
word that will appear in a conversation or in a written text. This can be modeled as a
process of inverse (Bayesian) estimation of the topic that produces the word. Also, in
the process of comprehension, when we read a complex text consisting of multiple
topics, such as a scientific paper or a novel, we can infer backward what topic the text
covers and predict the subsequent development of the text accordingly. This also
corresponds to the fact that the topic model models the process of considering the
topic proportion of an entire document when estimating the meaning of a word.
The topic model is also compelling from a cultural sociological perspective.
DiMaggio et al. (2013) provided a coherent discussion on this. They pointed out
the closeness of the topic model to cultural sociology in three or four ways.
The first is the relationality of meaning (DiMaggio et al., 2013). Central to
cultural sociology is the idea that “meanings do not inhere in symbols (words,
icons, gestures) but that symbols derive their meaning from the other symbols
with which they appear and interact (DiMaggio et al., 2013, pp. 586–587; Mohr,
1994; Mohr & Duquenne, 1997; Saussure, 1983). As mentioned earlier, the distri-
butional hypothesis of meaning in computational linguistics has its origins in this
cultural sociological idea and can therefore be regarded as the basis for the modeling
of this idea. Specifically, the topic model embodies the idea of relationality of
meaning in the way the topic of a word is determined. In this model, the topic of a
word is assigned by its co-occurrence with other words, which describes the process
by which meaning is relationally determined.
Cultural sociology also emphasizes that meaning is determined by “context.”
Thus, the same word may have different meanings in different contexts. This can be
called contextual polysemy, derived from the relationality of meaning (DiMaggio
et al., 2013). The topic model models this polysemy such that topic assignment
changes depending on differences in collocations, even for the same word. As
already mentioned, this modeling of polysemy is a strength of topic models that
take a structural representation.
DiMaggio et al. (2013) applied the topic model to a corpus of newspaper articles
on arts funding by the U.S. government. They tested whether the topic model
adequately captured the relationship between meaning and polysemy. In terms of
polysemy, they examined whether the word “museum,” for example, actually
expresses different meanings depending on the different topics assigned by the
model. The results show that the word “museum,” assigned to different topics, can
indeed be interpreted as meaning different things.
The second way regards heteroglossia (DiMaggio et al., 2013). This was origi-
nally Bakhtin’s idea that a text should consist of multiple voices. By voices, Bakhtin
5 Model of Meaning 79
Along with topic models, another model commonly used in sociological research is
the word-embedding model (Arseniev-Koehler & Foster, 2022; Jones et al., 2019;
Kozlowski et al., 2019). A word-embedding model provides an efficient represen-
tation of the meaning of a word using vectors, which is also referred to as the
distributed representation of words. The most important feature of this model is that
meanings can be arranged spatially by representing the meanings of words by
vectors. In semantic space, the more the meanings are represented by vectors that
are close to each other, the greater the “similarity” between them. In addition, in
semantic space, it is possible to perform calculations called analogy calculations. For
example, the following calculation is possible.
Underlying this calculation is the principle that a directional vector such as woman-
man corresponds to a certain semantic dimension (in this case, the gender dimen-
sion), and that by moving the king in that direction in space, meaning can be shifted
along that semantic dimension (see Fig. 5.2).
The structure of the semantic space in which this analogical calculation is
possible has great sociological applicability. For example, this feature can be used
to construct axes on the semantic space that represent specific social dimensions of
meaning, such as the gender dimension or the social class dimension. These
80 H. Takikawa and A. Ueshima
representations of context words are learned in terms of the representations that best
predict the target. Thus, the meanings stored in the word-embedding vector are
organized under the function of predicting words.
Furthermore, the word-embedding model is based on the distributional hypoth-
esis of meanings, evident from the structure of the CBOW model, in which context
words are linked to target words through prediction. The idea that meaning is
acquired through the task of predicting co-occurring target words is consistent
with the idea of the distributional hypothesis that meaning is determined by sur-
rounding words. Thus, this model also captures the relationality of meanings.
Finally, in terms of learning, the word2vec CBOW model did not use a Bayesian
inference framework. However, it models the learning process of acquiring the
semantic content of a word by trial and error through the success or failure of
predictions.
Such a generative model provides a basis for calculating the probability of occur-
rence of a word using the CBOW model. As we saw earlier, the CBOW model
calculates the probability of occurrence of a word c by its closeness to the average of
the k context words w1. . .wk. Such averaging naturally follows from the Bayesian
inference process of estimating the current c from the words produced by the random
walk of c over k periods. When interpreted as a model of semantic cognition, the
agent considers the process of estimating the potential meaning (discourse vector) of
a word from the distribution of surrounding words, assuming that words appearing in
the surroundings are likely to share a potential meaning with the target word.
There are other advantages of Arora’s model over word2vec’s CBOW model, in
which the observed word can be viewed as a compound construct consisting of
multiple discourse vectors or latent meanings analogous to the topic model. This
allows the model to address the problem of word polysemy, which is not possible
with the conventional embedding model. It also opens up sociological applicability.
Arseniev-Koehler et al. (2022) proposed DATM based on this model and applied it
to sociological text analysis.
Generative modeling of embedding models can be seen as the integration of topic
and word-embedding models (cf. Arseniev-Koehler et al., 2022), which is a prom-
ising approach. Generative models provide a model of semantic cognition, that is,
how humans perceive meaning and generate words. On the contrary, semantic space
models, although more limited than structured representations, have advantages over
topic models, such as their ability to extract semantic dimensions. Furthermore, the
weaknesses of the conventional embedding model, such as the handling of word
polysemy, can be overcome to some extent by generative modeling of the
84 H. Takikawa and A. Ueshima
embedding model. However, whether the semantic space model has reached the
same level of expressiveness as the structured representation model is an open
question.
Arora’s generative model assumes a very simple process of inferring the meaning
of a target word from surrounding words based on the assumption of a random walk
of the potential meanings of words. Further integration of the topic model and
embedding model should be pursued by building a generative model that closely
approximates the human meaning-making mechanism.
The models presented thus far have focused primarily on how people perceive the
meaning of objects and other people (texts about them). However, as Weber (1946)
points out, the interpretation of meaning in sociology is ultimately performed to
explain human action.
While some of our cultural and semantic constructions and their interpretations
are directly related to and explain actions, others do not. People’s retrospective and
ex-post “explanations” of their actions are at least not directly linked to their actions,
but rather justify them ex-post (Swidler, 1986; Vaisey, 2009). In contrast, values and
mental images motivate people to act. From the point of view of analyzing meaning-
making mechanisms to explain social phenomena as an accumulation of actions, it is
the latter kind of motivational meaning-making that we want to extract.
Therefore, the question of whether the semantic structure and semantic space
obtained by the topic model and embedding model are actually related to people’s
actions, and if so, how they are related, must exist at the final stage of elucidating the
meaning-making mechanism. In other words, it is necessary to examine the perfor-
mance of the meaning-making model by focusing on the extent to which the
meaning of an object identified by the meaning-making model can explain subse-
quent human behavior. If the meaning of the object identified by the model can
explain human behavior to some extent, it can be said that the model’s estimation of
“meaning” has some validity.
Here, we focused on a specific class of meaning-making models, the word2vec
model. We examined to what extent the meanings that the word2vec model specifies
can explain the succeeding behaviors. In other words, we examined whether and to
what extent it is possible to predict people’s behavior using the word vectors
obtained from word2vec as explanatory variables.
Here, we refer to the method in which word vectors are used as explanatory
variables in regression analysis to predict human judgments (Bhatia, 2019; Hollis
et al., 2017; Richie et al., 2019). Consider that we would like to use words as
explanatory variables and predict people’s responses to the words (a specific exam-
ple follows later). In this case, we can use word2vec to represent word meanings
using 300-dimensional vectors. This means that each explanatory variable (i.e., a
word) has 300 semantic dimensions, such as when we used 300 survey questions
5 Model of Meaning 85
regarding the word and quantified the word’s meaning with 300 numeric values.
Because each explanatory variable is now represented in 300 numeric values, a
regression model in this method has 300 corresponding coefficients that determine
the weights for each of the 300 semantic dimensions. Owing to the many explana-
tory variables, regularization methods such as ridge regression or model comparison
based on AIC are often used to prevent overfitting. This method shows that judg-
ments such as risk perception, gender roles, and health behaviors can be predicted
with high accuracy (Richie et al., 2019).
In a more recent study, Ueshima and Takikawa (2021) used word vectors to
predict people’s vaccine allocation judgments regarding COVID-19. Participants
rated how much priority vaccination should be given to each of the >130 occupa-
tions. The authors used a pre-trained Japanese word2vec resource (Manabe et al.,
2019) to obtain a 300-dimensional word vector for each occupation. They reported
that regression analysis using word vectors as explanatory variables exhibited a high
out-of-sample predictive accuracy for participants’ vaccination priority judgments.
To demonstrate the effectiveness of this approach, they compared the word-vector
regression model with a benchmark regression model. The benchmark regression
contained relevant explanatory variables such as the social importance of each
occupation to quantify the occupations. The results of the model comparison showed
that the word-vector model predicted vaccination priority judgments better than the
benchmark model did. It is notable that the explanatory variables of the benchmark
model—social importance, personal importance, and familiarity with each
occupation—were obtained from each participant, while the word vector used in
this study was not measured for this specific study, demonstrating the usefulness of
the word-vector approach. Overall, the results of this study suggest that word vectors
can quantify the meanings that people have for each occupation.
In regression using word vectors, prediction is made by learning the regression
coefficients or weights for each dimension of the word vector from the data.
Intuitively, this can be interpreted as modeling how much each of the hundreds of
semantic dimensions is weighted when people judge specific domains such as
vaccination priority, gender role, or risk perception.
Using weights (regression coefficients) for each semantic dimension makes it
possible to interpret the criteria used to make judgments (Bhatia, 2019). In Ueshima
and Takikawa (2021), participants answered that they would prioritize vaccination
for occupations such as nurses. Accordingly, the 300-dimensional weights of a
regression model trained by the participants’ vaccination judgments had large dot
products with the word vector of “nurse.” This is because larger dot products
indicate a higher prioritization of vaccination in this regression model. Importantly,
it is possible to calculate the dot products between the regression weights and word
vectors other than occupations. By exploring words that have larger or smaller dot
products with the obtained 300-dimensional weights, it is possible to interpret the
criteria associated with increasing or decreasing people’s judgments of vaccination
priority. In the case of the vaccination priority judgments, words associated with
medical institutions such as a “hospital” and with public service such as the “local
government” produced larger dot products with the weights compared to other
86 H. Takikawa and A. Ueshima
common words, suggesting that meanings of these words were related to criteria for
rating vaccination priority higher. Such exploratory analyses of judgment criteria
help to understand the psychological mechanisms underlying judgments.
Moreover, it was possible to predict vaccination priority ratings for occupations
that were not included in the study. For example, based on these exploratory
analyses, we can infer that an occupation such as a city hall worker would be rated
highly. Thus, interpreting the obtained weights can lead to rigorous confirmatory
research on new occupations. In summary, using word vectors as predictors or
explanatory variables in multiple regression analysis is a promising method for
predicting and interpreting human behavior.
The fact that the word vectors obtained from word2vec are helpful in predicting
human behavior indicates that it captures not only the semantic relationships
between words in the linguistic corpus but also, to some extent, human knowledge
about the world (Caliskan et al., 2017; Günther et al., 2019). To further develop
models of meaning for predicting human behavior, future research should consider
that the meanings of words are constructed not only by linguistic information but
also by the perceptual and motor systems of humans (Bruni et al., 2014; Glenberg &
Robertson, 2000; Lake & Murphy, 2023). Using multimodal data is a promising
direction for developing better models of meaning to predict human behavior.
Another important direction for future research is to model the heterogeneity of
meanings and behaviors among people. At present, corpora used to obtain word
vectors often consist of linguistic resources generated by many people and not by a
certain individual. Therefore, the vector representation of words to be learned
represents the average meaning or knowledge representation of words for the people
who generate the corpora. However, the meanings of words should differ among
individuals depending on the nature of the words (Wang & Bi, 2021). For example,
small children may associate an occupation, such as doctors, with fear, while older
people do not. Such heterogeneity of word meanings affects individuals’ behavior
differently. Thus, obtaining word vectors that capture the heterogeneity of meanings
is a necessary step toward modeling individual behaviors with higher accuracy.
5.6 Conclusion
In this chapter, we discuss the possibility of applying and extending models devel-
oped in computational linguistics to construct a sociological model of meaning-
making. The starting point for constructing a sociological theory of meaning-making
is to view it as a computational problem, that is, to capture valid information from the
environment with uncertainty, predicting next events and thus achieving one’s own
goals. From this formulation, we can draw three points of meaning production: the
predictive function of meaning, relationality of meaning, and Bayesian learning.
Both topic models and word-embedding models can be interpreted as theoretical
models of meaning-making, but there are differences between them. The topic model
is a hierarchical generative model of language that is highly compatible with cultural
5 Model of Meaning 87
sociology and particularly suited for capturing word polysemy. In contrast, word-
embedding models allow for spatial representation and are suitable for capturing the
cultural dimensions of events and practices. An integrated model of the topic and
word-embedding models is required. The last element that completes the theory of
sociological meaning-making is the link between interpretation and action. In this
section, we introduce a regularized regression model that examines the relationship
between semantic representation and action. Future directions include the incorpo-
ration of not only linguistic information but also nonlinguistic information, infor-
mation about the physical environment and the body, and heterogeneity of meaning
interpretation according to the attributes of the actor and the socialization process,
enabling a more precise prediction of subsequent actions.
References
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from
language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.
1126/science.aal4230
Chen, D., Peterson, J. C., & Griffiths, T. L. (2017). Evaluating vector-space models of analogy.
ArXiv, 1705(04416), 1–6.
Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford Univer-
sity Press.
Coleman, J. S. (1990). Foundations of social theory. Harvard university press.
Dilthey, W. (1910). Der Aufbau der geschichtlichen Welt in den Geisteswissenschaften. Verlag der
Königlichen Akademie der Wissenschaften, in Commission bei Georg Reimer.
DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23, 263–287.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of US government arts
funding. Poetics, 41(6), 570–606.
Durkheim, E. (1915). The elementary forms of the religious life: A study in religious sociology.
Macmillan.
Evans, J. A., & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual
Review of Sociology, 42, 21–50.
Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. In Studies in linguistic analysis
(pp. 1–32). Basil Blackwell.
Fligstein, N., Stuart Brundage, J., & Schultz, M. (2017). Seeing like the Fed: Culture, cognition, and
framing in the failure to anticipate the financial crisis of 2008. American Sociological Review,
82(5), 879–909.
Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of
culture. Poetics, 68, 144–154.
Garfinkel, H. (1967). Studies in ethnomethodology. Polity Press.
Glenberg, A. M., & Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of
high-dimensional and embodied theories of meaning. Journal of Memory and Language, 43(3),
379–401. https://doi.org/10.1006/jmla.2000.2714
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation.
Psychological Review, 114(2), 211.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content
analysis methods for political texts. Political Analysis, 21(3), 267–297.
Günther, F., Rinaldi, L., & Marelli, M. (2019). Vector-space models of semantic representation
from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psy-
chological Science, 14(6), 1006–1033. https://doi.org/10.1177/1745691619861372
Harris, Z. (1954). Distributional hypothesis. Word. World, 10(23), 146–162.
Hohwy, J. (2013). The predictive mind. Oxford University Press.
Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram
vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8),
1603–1619. https://doi.org/10.1080/17470218.2016.1195417
Jones, J. J., Amin, M. R., Kim, J., & Skiena, S. (2019). Stereotypical gender associations in
language have decreased over time. Sociological Science, 7, 1–35.
Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the
meanings of class through word embeddings. American Sociological Review, 84(5), 905–949.
Lake, B. M., & Murphy, G. L. (2023). Word meaning in minds and machines. Psychological
Review, 130(2), 401–431. https://doi.org/10.1037/rev0000297
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to
Western thought. Basic Books.
Lamont, M. (2000). Meaning-making in cultural sociology: Broadening our agenda. Contemporary
Sociology, 29(4), 602–607.
Lizardo, O. (2004). The cognitive origins of bourdieu’s habitus. Journal for the Theory of Social
Behaviour, 34(4), 375–401.
5 Model of Meaning 89
Lizardo, O. (2019). Pierre bourdieu as cognitive sociologist. In W. Brekhus & G. Ignatow (Eds.),
The Oxford Handbook of Cognitive Sociology. Oxford University Press.
Luhmann, N. (1995). Social systems. Stanford University Press.
Manabe, H., Oka, T., Umikawa, Y., Takaoka, K., Uchida, Y., & Asahara, M. (2019). Japanese word
embedding based on multi-granular tokenization results (in Japanese). In Proceedings of the
twenty-fifth annual meeting of the Association for Natural Language Processing.
Marr, D. (1982). Vision: A computational investigation into the human representation and
processing of visual information. MIT Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations
of words and phrases and their compositionality. Advances in Neural Information Processing
Systems, 26, 3111–3119.
Mohr, J. (1994). Soldiers, mothers, tramps and others: discourse roles in the 1907 New York
Charity Directory. Poetics, 22, 327–358.
Mohr, J. W. (1998). Measuring meaning structures. Annual Review of Sociology, 24(1), 345–370.
Mohr, J. W., & Duquenne, V. (1997). The duality of culture and structure: poverty relief in
New York City, 1888–1917. Theory and Society, 26, 305–356.
Mohr, J. W., Bail, C. A., Frye, M., Lena, J. C., Lizardo, O., McDonnell, T. E., Mische, A., Tavory,
I., & Wherry, F. F. (2020). Measuring culture. Columbia University Press.
Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological
Methods & Research, 49(1), 3–42.
Opp, K. D. (1999). Contending conceptions of the theory of rational action. Journal of Theoretical
Politics, 11(2), 171–202.
Richie, R., Zou, W., & Bhatia, S. (2019). Predicting high-level human judgment across diverse
behavioral domains. Collabra: Psychology, 5(1), 50. https://doi.org/10.1525/collabra.282
Saussure, F. (1983). Course in general linguistics. Open Court Press.
Schutz, A., & Luckmann, T. (1973). The structures of the life-world (Vol. 1). Northwestern
University Press.
Simmel, G. (1922). Die Probleme der Geschichtsphilosophie: eine erkenntnistheoretische Studie.
Duncker & Humblot.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic
analysis (pp. 439–460). Psychology Press.
Swidler, A. (1986). Culture in action: Symbols and strategies. American Sociological Review, 51,
273–286.
Takikawa, H. (2019). Topic dynamics of post-war Japanese sociology: Topic analysis on Japanese
Sociological Review corpus by structural topic model (Japanese). Sociological Theory and
Methods, 34(2), 238–261.
Ueshima, A., & Takikawa, H. (2021). December. Analyzing vaccination priority judgments for
132 occupations using word vector models. In IEEE/WIC/ACM International Conference on
Web Intelligence and Intelligent Agent Technology (pp. 76–82).
Vaisey, S. (2009). Motivation and justification: A dual-process model of culture in action. American
journal of sociology, 114(6), 1675–1715.
Wang, X., & Bi, Y. (2021). Idiosyncratic tower of babel: Individual differences in word-meaning
representation increase as word abstractness increases. Psychological Science, 32(10),
1617–1635. https://doi.org/10.1177/09567976211003877
Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology,
120(2), 313–351.
Weber, M. (1946). From Max Weber: Essays in sociology. Facsimile Publisher.
White, H. C. (1963). An anatomy of kinship: Mathematical models for structures of cumulated
roles. Prentice-Hall.
White, H. C. (1992). Identity and control: A structural theory of social action. Princeton University
Press.
White, H. C. (1995). Network switchings and Bayesian forks: Reconstructing the social and
behavioral sciences. Social Research, 64, 1035–1063.
90 H. Takikawa and A. Ueshima
White, H. C. (2008a). Identity and control: How social formations emerge (2nd ed.). Princeton
University Press.
White, H. C. (2008b). Notes on the constituents of social structure. Soc. Rel. 10-Spring’65.
Sociologica, 2(1).
White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple
networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4),
730–780.
Yin, J., & Wang, J. (2014). A Dirichlet multinomial mixture model-based approach for short text
clustering. In Proceedings of the 20th ACM SIGKDD international conference on knowledge
discovery and data mining (pp. 233–242).
Chapter 6
Sociological Meaning of Contagion
Yoshimichi Sato
1
Contagion and diffusion are used interchangeably in this chapter.
Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 91
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_6
92 Y. Sato
Diffusions of new cultural items such as ideas, values, and norms are different from
that of viruses. Centola and Macy (2007) call diffusions of new cultural items
“complex contagions” and argue that an individual needs to contact more than one
source of activation for a complex contagion to occur. In contrast, a simple conta-
gion, a contagion of a virus, for example, occurs if an individual contacts a person
having a new item such as a virus. This type of contagion is based on a biological
mechanism, and the individual does not need to interpret the meaning of the virus to
be infected by it. In contrast, a diffusion of a cultural item is based on a sociological
mechanism, and the individual interprets its meaning before he or she decides
whether to accept it or not.
Take a social movement, for example. Participation in a social movement, say, in
a demonstration incurs risks such as being arrested by police and being beaten by
opponents of the movement. Thus, exposure to a single source of activation is not
enough for an individual to participate in the risky movement. Rather, he or she
needs more than one source of activation to participate in it, as Centola and Macy
(2007) argue. Why does he or she need more than one source of activation? Centola
and Macy (2007) propose four reasons: strategic complementarity, credibility,
legitimacy, and emotional contagion. I will examine them in detail in the next
section. What is important about them is that, in a sense, Centola and Macy (2007)
go back to Rogers’ (2003) study of diffusion because he also recognizes the
importance of sociological and psychological factors for a diffusion to occur.
Centola and Macy (2007) conduct computer simulations to support their argu-
ment. They start with the small world model proposed by Watts and Strogatz (1998).
6 Sociological Meaning of Contagion 93
Watts and Strogatz (1998) assume a ring lattice in which an individual is connected
to four nearest neighbors. Then, they randomly cut ties and connect them to
randomly chosen individuals. This rewiring connects distant individuals and creates
a small world. The originality of their study is that they demonstrate that a simple
procedure of cutting and rewiring ties creates a small world, which became famous
because of Milgram’s small-world experiment (Milgram, 1967). He conducted
innovative social-psychological experiments. He and his collaborators randomly
chose a group of people in Wichita, Kansas, and in Omaha, Nebraska, and asked
people who agreed to participate in the experiments, who were called a starting
person, to send a folder via their acquaintances to a person living or working near
Harvard University, who was called a target person. A result of the experiments
shows that the median number of intermediate acquaintances for the folder to be
delivered from a starting person to the target person is five. The result empirically
and scientifically endorses the cliché used in daily life, “What a small world.” A
major contribution of Watts and Strogatz (1998) is that they succeeded in creating a
small world by a simple procedure of randomly cutting and rewiring ties.
Centola and Macy (2007) modify the model of Watts and Strogatz (1998) by
changing only one assumption. They make activation thresholds for contagion
higher than those assumed in previous studies. This seemingly minor change in
the assumption leads to the importance of wide bridges for complex contagion to
propagate. Simply speaking, the width of a bridge between two persons is the
number of bridges closely connecting them. [See Centola and Macy (2007,
pp. 713–714) for a mathematically strict definition of the width of a bridge.]
Centola (2018) advanced the study by Centola and Macy (2007) by conducting an
online experiment. As he points out, it is almost impossible to collect the whole
network data of a society even though its size is small in order to check the empirical
validity of the theory of complex contagion. To solve this problem, Centola (2018)
created a society online, which has two different social networks: a clustered
network and a random network. The society he created was an online health
community called the Healthy Lifestyle Network. When a participant in the exper-
iment arrived at a web page of the Healthy Lifestyles, he or she was given overview
information on it. Then, if he or she agreed to join the network and signed up for it,
he or she was randomly assigned to one of the two networks.
In either network, a participant knew that he or she could interact with only a set
of neighbors, who were called “health buddies” in the experiment. In other words, a
participant does not know the whole structure of the network he or she was allocated
to. In the clustered network a participant shared overlapping contacts with other
health buddies, so he or she had wide bridges with them. In the random network, a
participant does not share such wide bridges with other health buddies.
Contagion began with the random choice of a participant. The participant sent a
message to his or her neighbors to encourage them to join a health forum website.
The neighbors decided whether to join the forum or not. If a neighbor joined the
forum, invitation messages were automatically sent from him or her to his or her
health buddies to invite them to join the forum.
94 Y. Sato
Joining the forum was not easy, however. A participant could not join the forum
only by clicking a “Join” button. Rather, he or she had to fill in answers to questions
on a registration form that was long enough for him or her to scroll down to
complete. Centola (2018) intentionally designed this form to make the contagion
in the experiment more difficult than a contagion of a virus. In other words, this task
of joining the forum is an appropriate condition to test the empirical validity of the
theory of complex contagions proposed by Centola and Macy (2007).
Results of the experiment clearly showed that diffusion in the clustered network
evolved faster than that in the random network and that the percentage of adopters
was higher in the clustered network than that in the random network. In addition,
carefully observing the process of contagion in the two networks, Centola (2018)
reported that invitation messages circulated in the same neighborhood in the clus-
tered network, which means that a participant was exposed to the invitation mes-
sages via more than one neighbor. In the random network, in contrast, the messages
quickly diffused in the network, but, because of the lack of redundancy, the diffusion
did not evolve so fast as in the clustered network, and the percentage of adopters was
lower than that in the clustered network. These results empirically support the theory
of complex contagions.
Although it has advanced the study of contagion, I would argue that the theory of
complex contagions does not fully explain a contagion process among individuals,
because it does not incorporate meaning and interpretation in the contagion process.
Centola and Macy (2007) and Centola (2018) proposed four mechanisms to explain
why complex contagions need multiple sources to occur: strategic complementarity,
credibility, legitimacy, and emotional contagion. Strategic complementarity means
that for an individual to participate in a risky, costly behavior such as participation in
collective action, he or she needs to know other people in his or her network have
already participated in it. Credibility means that for an individual to adopt a new
item, he or she needs more than one sources to believe that the item is credible.
Legitimacy seems related to strategic complementarity. If some people who are
strongly connected to an individual have participated in a risky, costly behavior such
as a demonstration, bystanders become more likely to accept it as legitimate, which
in turn encourages the individual to participate in it. Emotional contagion means that
emotions are exchanged and amplified in a collective action, and such emotional
contagions encourage an individual connected to close friends participating in it to
participate in it.
These mechanisms are sociologically plausible, but they miss interpretation and
meaning in the diffusion process (Goldberg & Stein, 2018). Because the theory of
associative diffusion by Goldberg and Stein was explained in Chap. 2, I revisit the
6 Sociological Meaning of Contagion 95
Before examining how to incorporate interpretation and meaning in big data analysis
in detail, let me quickly review their history in sociology to show their importance.
Mead (1934) was one of the founding fathers who introduced interpretation and
meaning in sociology. To summarize his profound theory in a very simple way, self
consists of I and Me. Me is expectations of others I accepts, and I reacts to the
expectations. Multiple selves smoothly interact with each other if the reactions are
not contradictory to the expectations. I do not think that Me is the expectations of
others as they are. Rather, I interprets the meaning of the expectations and reacts to
the interpreted expectations. Here we see the interaction between interpretations and
reactions.
96 Y. Sato
Berger and Luckmann (1966) took Mead’s theory a step further in sociology.
They proposed a theory that explains how reality is constructed by interactions of
actors. Reality, or social reality to be exact, does not exist without actors. Actors
interact with other actors, add meanings to their actions, and interpret them. If
actions and interpretations fit well together, reality emerges. In other words, reality
is socially constructed by actors involved. Then, actors interpret the reality as
objective social order and behave in accordance with it. Here again, we observe
the interaction between interpretations and actions in the creation of social reality, or
social order.
The theory of social construction of reality by Berger and Luckmann (1966)
influenced studies of social movements, mobilization processes in particular, which
are closely related to the theory of complex contagion by Centola and Macy (2007).
Resource mobilization theory (e.g., McCarthy & Zald, 1977) used to be a main
paradigm in the study of social movements. The theory argues that resources such as
social movement organizations and human and financial resources are necessary for
social movements to emerge and succeed. However, it was criticized because it did
not exactly explain how people were mobilized. To overcome this shortcoming of
the theory, D. A. Snow, the main figure in the development of a new theory called
frame theory, and his colleagues focused on how a social movement organization
aligns its frame with that of people the organization wants to mobilize (Snow et al.,
1986). It is often the case that the frame of a social movement organization is
different from that of its target people in the beginning of a social movement. In
this case, even if the organization has plentiful resources for mobilization, the target
people do not participate in the movement. This is because the people do not
understand the meaning of and, therefore, the significance of the movement due to
the difference between their frame and that of the organization. Thus, the organiza-
tion tries to adjust its frame so that the target people would interpret the movement as
significant and related to their own interests. Here again, it becomes obvious that the
people’s interpretation of the adjusted frame and the movement is the key for the
organization to succeed in mobilizing its target people.
So far, I argued the importance of interpretation and meaning in sociology in
general and the study of social movements in particular. This is also the case when
we study the process of diffusions, as we observed in the failure of the boiling water
campaign in a Peruvian village. Then, how can we incorporate interpretation and
meaning in big data analysis of diffusion? This is a difficult task because most of the
big data is about behavior of people, and, therefore, big data analysis finds it difficult
to deal with interpretation and meaning. Of course, people express their opinions on
Twitter and Facebook, but we do not know how people interpret them and add
meaning to them by big data analysis.
How can we solve this problem? One possible solution would be applying topic
models to text data such as Twitter and Facebook (Bail, 2016). Chapter 2 cites
DiMaggio et al. (2013) who applied a topic model to newspaper articles to explain
the decline in the government support for artists between mid-1980s and mid-1990s.
Here I examine Bail’s (2016) work in detail because he and I share the same interest
in interpretation and meaning in big data analysis.
6 Sociological Meaning of Contagion 97
In general, people do not receive discourses expressed by other people as they are.
They interpret the discourses and add meaning to them using their cognitive schema
or frame. If the discourses and their cognitive frame are close to each other, people
tend to accept them. If not, they tend to refuse them.
Based on this theoretical idea, Bail (2016) studied how organ donation advocacy
organizations align their frames with those of their target population to induce people
in the target population to engage with them. He proposed a theory of cultural
carrying capacity. Organ donation advocacy organizations face a dilemma when
they use social media to attract more people. On the one hand, they need to cover
various topics in their messages so that more people would resonate with their frame.
To cite Bail’s example, an organization that produces messages discussing only
sports could attract only basketball fans, but one that spreads messages about sports,
religion, or science could induce not only sports fans but also religious people or
scientists to endorse its activities.
On the other hand, if an organization produces messages that cover too many
diverse topics, such diversification might limit its capacity to mobilize people. This
is because diversification creates disconnected audiences without collective identity,
which is necessary for mobilization.
Thus, theoretically, diversification of topics in messages has an inverted
U-shaped effect on the number of the mobilized people. If the level of diversification
is very low as in the case of messages whose topic is only about sports, the number of
the mobilized people should be small. Meanwhile, if the level of diversification is
very high, the target population becomes fragmented, and, therefore, the number of
the mobilized people should be small, too. Thus, the number of the mobilized people
should become the largest if the level of diversification is in the middle.
To check the empirical validity of this theoretical reasoning, Bail created an
innovative method. He created a Facebook application. If an organ donation advo-
cacy organization participates in the study and installs the application, it is provided
with useful information on and recommendations about its online outreach activities.
In return, the application collects all publicly available text from the organization’s
Facebook fan page and Insights data available only to the owner of the fan page, that
is, the organization. Then the application conducted a survey to the representative of
the organization to collect information on the organization and its outreach tactics.
Forty-two organizations participated in the study and produced 7252 messages
between October 1, 2011 and October 6, 2012. These messages are analyzed by
structural topic modeling. Topic modeling extracts latent topics from the messages
and calculates scores that show how strongly a message is associated with topics.
The scores are called membership scores. Structural topic modeling deals with meta-
data so that it could deal with temporal change in the meaning of a word or a group of
words vis-à-vis a topic.
Eventually, 39 topics were identified. Then Bail (2016) created an index that
shows the diversity of topics using the matrix of membership scores for each post in
the topics. He calls the index the coefficient of discursive variation. Mathematically,
the coefficient for each organization i at time t (Cit) is defined as follows:
98 Y. Sato
σt - 7
Cit =
μt - 7
“where σ is the standard deviation of the membership matrix of organization i’s posts
in the 39 topic categories during the previous week and μ is the mean score of the
posts across all topic categories during the same time period” (Bail, 2016, p. 284).
The coefficient of discursive variation looks like the coefficient of variation. If
organization i uses diverse topics in their posts, Cit becomes large. Thus, the
coefficient of discursive variation can be used to check the empirical validity of
the abovementioned theoretical argument. If the coefficient is very small or very
large, the number of mobilized people (the number of engaged Facebook users by
day in this case) should be small. If the coefficient is in the middle, the number of
mobilized people should be large.
To show that his theoretical argument is empirically valid, Bail (2016) conducted
a sophisticated type of regression modeling with the number of engaged Facebook
users by day as the dependent variable. The key independent variable is the coeffi-
cient of discursive variation. He also considered other theories proposing factors that
might affect the number of engaged Facebook users by day and included variables
derived from the theories in the models as control variables. Thus, if the coefficient
of discursive variation has an inverted U-shaped effect on the number of engaged
Facebook users by day after controlling for the variables derived from other theories,
his theoretical argument would be empirically supported.
A simple graph in Fig. 6.1 shows that the theoretical argument seems empirically
valid. To confirm the robustness of the graph, Bail conducted regression analysis,
and its results support the theoretical argument. The coefficient for the coefficient of
discursive diversity is positive, and that for the squared coefficient of discursive
diversity is negative after controlling for other variables. This means that the
coefficient of discursive diversity has an inverted U-shaped effect on the number
of engaged Facebook users by day as in Fig. 6.1.
The significance of Bail’s (2016) study is that it highlights the importance of
meaning and interpretation in mobilization. If an organ donation advocacy organi-
zation posts messages focusing on too few topics, its frame does not match frames of
the audience, so it cannot get attention from a wide range of audience. If it posts
messages about too many various topics, the audience interprets that its frame is
fragmented and contradictory, so it cannot appeal to a wide range of audience, either.
Only if it posts messages covering an adequate range of topics, the audience can
clearly interpret its frame as important to them and become mobilized.
In addition, his study explains why more than one source is necessary for
complex contagion to propagate. As abovementioned, Centola and Macy (2007)
and Centola (2018) proposed four mechanisms for complex contagion to occur:
Strategic complementarity, credibility, legitimacy, and emotional contagion. From
the viewpoint of frame analysis in mobilization, a mobilizing organization, an organ
donation advocacy organization in Bail’s study, aligns its frame with frames of its
target population through the four mechanisms. If the four mechanisms work well,
the organization finds it easier to resonate its frame with frames of the target
6 Sociological Meaning of Contagion 99
Fig. 6.1 Relationship between the coefficient of discursive variation and the number of engaged
Facebook users by day. Gray zone represents standard errors with 95% confidence interval.
(Source: Bail 2016, p. 286, Fig. 1)
population. Based on Bail’s argument, if a target person receives more than one
message with different topics from different source, his or her frame is more likely to
resonate with that of the organization. Here we can clearly understand why complex
contagion needs more than one source to propagate.
The theory of cultural carrying capacity also helps us clearly understand why the
campaign for boiling water in a Peruvian village failed, which was discussed in Sect.
6.3. Nelida, who was in charge of the campaign, failed to persuade the villagers to
boil water. This is because she did not resonate the frame of the Peruvian public
health agency with that of the villagers. If she had used topics in the persuasion
process that overlapped with topics the villagers were interested in, she could have
succeeded in persuading them to boil water.
6.5 Conclusion
The study of complex contagion by Centola and Macy (2007) and Centola (2018) is
a seminal work showing that a contagion of a cultural item is substantively different
from that of a virus, but it does not completely explore how meaning and interpre-
tation function in the process of complex contagion. Conversely, Bail (2016) does
100 Y. Sato
not talk about complex contagion, but he proposes a thought-provoking theory, the
theory of cultural carrying capacity, emphasizing the importance of meaning, inter-
pretation, and frame resonance when a movement organization tries to mobilize their
target population. He checked the theory’s empirical validity collecting and analyz-
ing big data, Facebook posts.
Combining studies by Centola and Macy (2007), Centola (2018), and Bail (2016)
gives us a deeper comprehension of the mechanism of complex contagion. However,
this is just an example showing that focusing on meaning and interpretation enriches
studies using big data and makes them more significant in sociology. Furthermore,
this research strategy would contribute to solve research questions sociologist has
attacked without big data and failed to find answers to them.
References
Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology
of knowledge. Anchor Books.
Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton Univer-
sity Press.
Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113(3), 702–734.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41(6), 570–606.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
McCarthy, J. D., & Zald, M. N. (1977). Resource mobilization and social movements: A partial
theory. American Journal of Sociology, 82(6), 1212–1241.
Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University
of Chicago Press.
Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67.
Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment
processes, micromobilization, and movement participation. American Sociological Review,
51(4), 464–481.
Tarde, G. (1890). Les lois de l’imitation: Etude sociologique. Félix Alcan.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic
and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling
study. Lancet, 395, 689–697.
Chapter 7
Polarization of Opinion
7.1 Background
In recent years, there has been growing concern about that opinion polarization and
fragmentation appear to be increasingly pronounced over various issues. Specifi-
cally, previous research centered on the United States has evidenced that opinions on
several crucial issues have become increasingly divided and polarized since the
1960s (Hetherington, 2001; McCarty et al., 2006). Significant social and political
changes, including the civil rights movement, anti-war protests, and feminist, have
prompted individuals and organizations to take more pronounced positions on a
range of topics, leading to a general trend toward opinion polarization. Furthermore,
there is broad scholarly consensus that the degree of opinion polarization has
continuously increased since that and the ramifications of polarization may hold
greater significance in today’s social context.
The rise of opinion polarization has aroused extensive interest because its nega-
tive consequence can pose a disruptive threat to democratic societies. When indi-
viduals or groups are strongly polarized, they typically become less willing to
comprise with others and more prone to conflicts and even violence. This tendency
represents a severe societal risk by undermining the capacity to respond to pressing
Z. Lyu (✉)
Graduate School of Arts and Letters, Tohoku University, Sendai, Japan
e-mail: [email protected]
K. Nagayoshi
Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 101
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_7
102 Z. Lyu et al.
suggests that group membership can trigger more positive emotional reactions
toward the in-group than the out-group and a greater willingness to cooperate with
members of the in-group (Iyengar et al., 2012; Tajfel, 1982). For instance, with
regard to politics, as a social identity, partisanship contributes to bipolarity and the
favoring of people with similar political views while strongly biased against those
with opposing ones. Thus, increasing affective polarization can be characterized by
negative feelings toward opposing political parties or their supporters and positive
feelings toward one’s preferred political party.
Theoretical thinking also involves the generated hypotheses that should be empiri-
cally examined. In the discussion about opinion polarization, there are broadly two
mechanisms proposed for explaining its emergency and increasing.
On the one hand, opinion polarization is assumed to be associated with the
homophilic feature in interactions and connections among individuals. More specif-
ically, in order to reduce cognitive dissonance, individuals tend to seek out infor-
mation that confirms those beliefs while ignoring or dismissing information that
contradicts them (Stanley et al., 2020), as a result, individuals would increasingly
interact with those who share their views and are exposed to homophilic information
while distancing themselves from interactions with those who hold divergent opin-
ions and avoiding exposure to opposing viewpoints. In this way, since homophilic
interactions are deemed to reinforce the existing opinion, it often leads to individuals
taking more extreme positions (Sunstein, 2002).
On the other hand, group-justifying biases can also contribute to opinion polar-
ization. Individual’s group identity constitutes the psychological attachment to a
particular group, which is considered as one of the most crucial factors that affect
their opinions. Once identified with a group member, in order to enhance the self-
categorization as a good group member, self-categorization to a group can induce
individuals to adopt various forms of motivated reasoning, including in-group
favoritism and out-group derogation (Tajfel, 1982), as a way to maintain group
distinctiveness and advance their in-group member status. In this term, such
in-group/out-group distinction can exacerbate the extent of polarization.
shifting society composed of new technologies and issues that enable new identities,
forms, and practices, an updated understanding of opinion polarization is needed. Up
until recently, most quantitative social science research has been limited to the
conventional statistical analysis of survey data. However, survey-based research
relies on the sampling of observations, which often encounters limitations in term
of representativeness, scale, and granularity. The limitation of the traditional method
is becoming rather apparent in the investigation of opinion polarization as the
mechanism and process of opinion polarization involve a seemingly endless number
of information flows, expressed opinions, and interrelated behaviors among different
actors across long-term time spans, which is difficult to be addressed using the
survey taken in small groups.
Fortunately, the developments of toolsets and computational capacities offer
significant potential for a better investigate and understand human behaviors and
opinions. In recent years, an unprecedented amount of digital data alongside a
variety of computational methods have illuminated new pathways to overcome the
limitations of previous social science research, thereby birthing a novel field known
as computational social science (Edelmann et al., 2020; Hofman, 2021; Lazer et al.,
2009). Broadly speaking, the primary features of the computational social science
research paradigm are concerned with collecting, manipulating, and managing big
data, utilizing advanced computational methods to improve the understanding of
human behaviors and social processes. These features bring up strong expectations
with regard to the promises they hold for the study of opinion polarization.
First, big data typically encompasses comprehensive and multivariate informa-
tion about the behaviors of large populations. The primary strength of big data lies in
its remarkable scalability and detailed granularity. This allows researchers to mon-
itor human behaviors with high frequency and on a large scale. Such capabilities are
invaluable for creating innovative measures of public opinion and for closely
examining interrelated human behaviors and social phenomena. From this perspec-
tive, the characteristics of big data provide researchers with the tools to explore a
broad spectrum of opinions and behavioral dynamics on a substantial scale, which
could be instrumental in revealing new aspects and fundamental mechanisms driving
the phenomenon of opinion polarization.
Also, computational social science has the potential to revolutionize the research
on opinion polarization through the introduction of advanced computational
methods and new techniques. A wide range of computational methods, such as
network analysis, natural language processing (NLP), and machine learning have
been applied to investigate the status and mechanism of opinion polarization. These
approaches provide detailed insights into the formation, dissemination, and evolu-
tion of opinions in various contexts, reaching a level of scale and depth unattainable
with conventional methods. Beyond that, the experimental approach, traditionally
used to explore the causality of opinion polarization, has undergone significant
evolution due to recent technological advancements. Specifically, through
conducting experiments with thousands of participants in the online discourse,
digital experiments allow observation of a sizeable number and heterogenous indi-
viduals in natural settings, which affords great promise to overcome the limitations
7 Polarization of Opinion 105
Generally speaking, network analysis provides useful concepts, notions, and applied
tool to describe and understand how actors are connected. In a network, nodes
typically represent actors or institutions, whereas edges represent connections
between such entities. Over the recent decade, there has been increasing attention
devoted to the applications of network analysis methods in social science. Much of
this interest stems from the flexibility of social networks to define and model various
relationships among social entities. Social networks consisting of actors and social
relations are ubiquitous in society. For instance, people connected by the common
interest, citizens connected by the common supported party, or social media users
connected by the interactions in the platform. Importantly, the structure and dynam-
ics of these connections have the potential to yield meaningful insights into a variety
106 Z. Lyu et al.
of social science problems, thus it has become an established field within the social
sciences and has become an indispensable resource for understanding human behav-
iors and opinions (Borgatti et al., 2009; Watts, 2004).
Previously, since collecting relational data through direct contact is time-
consuming and difficult, social network analysis was typically restricted to small
bounded groups. Thanks to the development of information technology, recent years
have witnessed an explosion in the availability of networked data. Especially, the
rapid increase in the use of social media has generated time-stamped digital records
of social interactions. These digital records have reinvigorated social network
analysis by enabling analyses of relations among social entities with unprecedented
scale in real-time, which has also opened up new opportunities to investigate opinion
polarization on social media.
First, network analysis can provide insight into the nature and dynamics of
opinion polarization by serving as a method to detect individuals’ opinions. As
indicated above, opinion polarization involves the degree to which people hold
competing attitudes on the specific issues. Here, one of the crucial questions refers
to how to quantify individuals’ opinion. From the perspective of network analysis,
the basic idea is to assume individuals are embedded in social relations and interac-
tions with measurable representations of patterns. Specifically, one of the most
important characteristics of the social network is the homophily principle, which
implies that people’s social networks are homogeneous with regard to
sociodemographic characteristics, behaviors, and opinions (McPherson et al.,
2001). Accordingly, it is reasonable to assume that people who have similar opinion
leanings are more likely to share a homophily social network, accordingly, peoples’
social network structure is assumed to be associated with their opinions. Based on
this theoretical assumption, the availability of network data among social media
users has elicited many efforts to estimate opinions with network-based measure-
ment. Specifically, interactions on social media can be naturally described as the
social network in which individuals with shared interests tend to form groups and
individuals within the same community likely share a similar opinion. For example,
many social media platforms, including Twitter and Facebook, allow users freely
choose whom to follow or not follow. These interactions have potential to serve as a
source of opinion detection. Barberá (2015) introduced a systematical framework to
estimate individuals’ ideology based on following relationships with politicians.
According to the homophily principle, it is reasonable to assume that users following
the same politicians tend to share similar political opinions as follow network reflects
their political preference. Computationally, the following relationship can be
aggerated as an adjacent matrix that reflects the following relationship among
politicians and ordinary users. Then, the dimension reduction algorithms, such as
singular value decomposition, can be applied to map these following relationships
into low-dimension ideological space. In this term, each node in the following
network can be attributed with an estimated ideology based on their network
structure. Beyond the following relationship, other interaction behaviors and con-
nections on social media, such as “like,” “retweet,” and “reply” have also been
proven to be associated with individuals’ latent ideology (Bond & Messing, 2015;
7 Polarization of Opinion 107
Wong et al., 2016). Furthermore, recent studies attempt to employ more sophisti-
cated methods that can integrate different types of information attached to nodes and
edges in the network to predict ideology. Graph neural networks (GNNs) method has
a powerful capability to handle abundant information with edges among multi-type
nodes and attributes associated with each node (Zhang et al., 2019), making it
suitable to capture nuanced patterns and relationship. For example, utilizing
GNNs, multiple relations on Twitter, including follow, retweet, like, and mention
can be aggerated as input for the deep learning model of ideology detection (Xiao
et al., 2020).
Second, another stream of research has employed network analysis to investigate
opinion polarization from the perspective of homophily interactions. Selective
exposure and echo chamber are widely used to describe a particular situation in
which people are only exposed to information and ideas that reinforce their existing
opinions, thereby creating a self-reinforcing cycle that reinforces those opinions
(Garrett, 2009; Prior, 2013; Stroud, 2010; Wojcieszak, 2010). Notably, they are
assumed to diminish mutual understanding, narrow the diversity of exposed view-
points, and ultimately lead to a situation where people have less common ground and
feel animosity toward those who hold opposing views, that is, opinion polarization.
From this perspective, how individuals interact with others, and how individuals are
exposed to information flow, could provide important insights for the underlying
mechanism of opinion polarization. A growing body of research has suggested the
homophily tendency in the connections and interactions on social media. Beyond
that, the availability of social network data enables researchers to investigate con-
nections and interactions from more diverse and nuanced perspectives. For example,
Conover et al. (2011) employed clustering algorithms to investigate the political
communication network on Twitter, demonstrating that the retweet network exhibits
two highly segregated communities of users, while mention network is much more
politically heterogeneous. Barberá et al. (2015) suggest that in political discussion,
information was exchanged primarily among individuals with similar ideological
preferences, yet such homophily tendency is much weaker in discussion related to
other issues. Bakshy et al. (2015) examine how millions of Facebook users interact
with socially shared news. The analysis suggests homophily tendency is more
pronounced in shared links of hard content such as national news, politics, or
world affairs. These researches indicate that the “echo chamber” narrative might
be overstated as such a tendency appears to be more pronounced in the specific
context.
Beyond that, network analysis can also furnish the investigation of affective
polarization. Affective polarization refers to the gap in feelings between in- and
out-group members. Many social media platforms allow users to express their
attitudes toward the posts or other users through functions such as “like” and
emotional reactions. The expressed sentiment and opinions in interactions inspire
the investigation of affective polarization based on network analysis. For instance,
Rathje et al. (2021) empirically indicated that contents related to the out-groups are
more likely to elicit negative emotional reactions such as “angry,” while contents
related to the in-groups are more likely to elicit positive emotional reactions such as
108 Z. Lyu et al.
“love.” Brady et al. (2017) find that messages revealing negative emotions toward
rival political candidates are more likely to be spread within liberal or conservative
in-group social networks. Marchal (2022) suggests that negative sentiment is sig-
nificantly more salient in interactions between crosscutting users rather than like-
minded users.
To summarize, the combination of network analysis methods and big data allows
us to formalize diverse patterns of social networks and investigate their characteris-
tics from more comprehensive and nuanced perspectives. Particularly, it enhances
our understanding of how interaction patterns, related issues, and the actors involved
contribute to the degree of opinion polarization. These implications are crucial in
deepening our understanding on the states, mechanisms, and consequences of
opinion polarization.
Texts and words are integral parts of society. People typically use texts and words to
express, make proposals, and communicate with each other. These texts and words
can serve as crucial source that reflects individuals’ beliefs, attitudes, and opinions.
In particular, the development of digitalization is generating an unprecedented
volume of textual data that can be used to investigate opinions. Also, many efforts
have been devoted to making textual data easier to acquire. For instance, many
governments have established digital data archives of policy documents, congres-
sional records, and reports, social media platforms such as Twitter provide applica-
tion user interfaces (APIs) that allow researchers to access and make use of generated
textual data of users.
Despite its great potential, text analysis has always been a difficult task because
human language is complex and nuanced. Especially, increasing availability of these
new data sources came up with demands for advanced techniques to handle the
scalability and complexity of the text. Investigation of opinion based on textual data
requires a method to describe how opinions can be measured and quantified.
Traditionally, content analysis of opinion revealed in texts has been based on
hand-coding by coders, which involves a series of processes such as developing
coding schema and training coders. These processes are usually time-consuming,
especially, as the scale of textual data gets larger, it becomes harder to process a large
amount of text relying on hand-coding. Fortunately, the advent of computational
methods and ever-increasing computing power has substantially refined the way for
further research in this direction. Compared to the traditional approaches, the
development of NLP techniques now provides a broad spectrum of advanced tools
for analyzing large-scale textual data more efficiently (Wilkerson & Casas, 2017).
Motivated by the advent of computation techniques and the increasing availabil-
ity of textual data, there has been a growing interest in capturing the states and
dynamics of opinion polarization through the automatic detection of opinions from
large textual data.
7 Polarization of Opinion 109
Automatic text analysis method requires a model describing the pattern and
structure of texts in a computational way. Typically, the model is designed for
transferring the unstructured textual data into the “structural data” (i.e., numerical
representation) that can be further analyzed by various computational methods,
which is known as “feature extraction” in NLP.
The most common strategy of feature extraction refers to the bag-of-words
(BoW) model, which describes the occurrence of words within a document while
disregarding the grammatical details and the word order. For the detection of
opinions, tokens are usually matched with a list of words that have been previously
annotated as opinion-related terms. For example, Laver and Garry (2000) developed
a dictionary to define how a series of words related to the specific content categories
and political party. The content of a text can be automatically identified by matching
words to their respective categories in a dictionary. Laver et al. (2003) developed the
Wordscores method that uses the reference text with the annotated political place-
ment to replace the dictionary. More specifically, Wordscores assumes that the
revealed political opinion of the new texts can be derived from the similarity to
the reference texts based on the word frequency. Therefore, word frequency infor-
mation from “reference” texts with the annotated ideological positions can be used to
make predictions for new texts for which the positions are unknown. The BoW
model, while straightforward and manageable, has several inherent limitations. First,
it views words as individual entities with distinct meanings, often overlooking key
aspects like grammatical structure and word order in texts. Second, encompassing all
words that occur in a pre-encoded dictionary can be challenging, leading to the
exclusion of significant details in text analysis and possibly resulting in bias.
Additionally, extensive human intervention is often required to choose relevant
words or reference texts for assigning meanings to each text. In particular, the
coding scheme is typically compatible only with specific texts, which limits the
methods’ generalizability. Consequently, the development and maintenance of dic-
tionaries tend to be time-consuming and labor-intensive.
More recently, social scientists have adopted more sophisticated methods to
improve the efficacy and accuracy of text opinion scaling. Specifically, to overcome
limitations inherent in the BoW model, word embedding models have been applied
for estimating opinions revealed from texts. Broadly speaking, word embedding
attempts to encode each word into a dense low-dimensional vector, where proximity
between two vectors indicates greater semantic similarity between the associated
words. To achieve this aim, word embedding model assumes that the semantic
meaning of words can be inferred based on the words that appear frequently in a
small context window around the focal word. There are various approaches and
architectures of word embedding, such as Word2Vec (Mikolov et al., 2013) and
GloVe (Pennington et al., 2014), to define the context and the semantic meaning of
words in different ways, while they commonly capture context, relation, and seman-
tic similarity of words in the texts. Therefore, using word embedding to represent
text as a vector can preserve more information in the raw text for opinion detection.
Moreover, word embeddings require less human intervention since they can be
efficiently trained based on the large, preexisting, and unannotated corpora. With
110 Z. Lyu et al.
this great advantage, there is immense promise with word embedding models in text
as data research and several studies have leveraged it as an alternative way to capture
the opinions of individuals and organizations. Rheault and Cochrane (2020)
employed parliamentary corpora and augmented it with input variables reflecting
party affiliations to train the “party embedding.” Since the word embedding model is
powerful at capturing the pattern and characteristics of text to build better feature
representations, party embedding can reflect similarities and differences in ideolog-
ical positions and policy preferences among parties. Notably, word embedding can
be automatically trained and easily adapted to new tasks or domains based on new
data. For example, parliamentary corpora of different countries enable the compar-
ison of opinion polarization across counties, historical textual data enable the
investigation of opinion polarization over time. Furthermore, word embedding
model can be flexibly and efficiently applied to various types of corpus. For
example, it is reasonable to assume that published posts and profile descriptions
can reflect the opinions of a social media user, thus these textual data can be
aggregated at the individual-level to produce word embedding representation
reflecting individuals’ opinions (Jiang et al., 2022; Preotiuc-Pietro et al., 2017).
Experiment is one of the most important methodologies that has been widely applied
in social science to theoretical hypotheses and establish causality. The main advan-
tage of experiments is the possibility to tightly control the setup of experiment
condition to systematically estimate the effect of a stimulus or condition. However,
conventional experiments were typically conducted in offline laboratories or in terms
of surveys among relatively small populations. Especially, considering the highly
controlled settings and limited diversity of participants, it becomes harder to estab-
lish the reliability and generality of experimental insights in real-life situations
outside the laboratories and specific populations.
In recent years, with the availability of digital platforms and tools, many
researchers have begun utilizing the Internet as a novel intermediary to recruit
participants and conduct experiments. The digital experiment represents several
advantages that can not only help scholars to conduct more scalable, flexible, and
efficient experiments but also provide useful tools to inspire new strategies of
experiments.
First, compared to the traditional offline experiment, the digital experiment over-
comes the limitation of space and time, which can facilitate a more sizable and
heterogenous recruitment of participants to improve the external validity of exper-
imental research (Peng et al., 2019).
Second, digital techniques enable the collection of real-time, high granularity, and
fine-grained behavioral data of participants. These digital trace data can not only
provide additional information for measuring the intensity of the treatment effect and
temporal changes of behaviors but also allow the check of compliance with
7 Polarization of Opinion 111
treatments to enhance the validity of experiments (Guess et al., 2021; Stier et al.,
2020). Connected to this, digital experiment is more flexible for a long-term design,
allowing scholar to assess change in individual attitudes and behaviors over time.
Third, digital experiments can be conducted in natural settings to achieve full
ecological validity and to avoid demand effects (Mosleh et al., 2022). Since behav-
ioral tracking tools can collect real-world data unobtrusively in the naturalistic
environment, thus causal effects can be explicitly examined by observing actual
behaviors.
The power to detect causation has inspired research leverage digital experiments
to investigate the underlying mechanism of opinion polarization. In practice, it
would be beneficial to combine the digital experiment with other available data,
such as survey data and behavioral data, to examine the causality of opinion
polarization. Typically, participants are motivated to change their information expo-
sure in online discourse by changing their news feeds or social media following
patterns. In this term, causal effects of information exposure can be examined in
natural settings. Also, other key variables, such as political attitudes, policy prefer-
ences, and demographic information can be accurately estimated by the survey.
Moreover, digital trace data, such as participants’ social networks, generated con-
tent, and interactional behaviors in the online discourse can provide important
insights into how individuals’ opinions and interrelated behaviors change over
time. In this term, the combination of digital experiments and other data sources
can not only shed light on the causality of political polarization but also provide
insight into the cumulative effects of interventions through the investigation of fine-
grained behavioral data.
Bail et al. (2018) incentivized a large group of social media users to follow bots
that retweeted messages by elected officials and opinion leaders with opposing
political views. Through evaluating the impact of the treatment on participants’
opinions via surveys, the results indicate that exposure to opposing political views
may not mitigate opinion divergence and can even generate backfire effects that
intensify political polarization. Similarly, Casas et al. (2023) also focus on the effect
of exposure to dissimilar views on opinion polarization. In a longitudinal experi-
ment, participants were incentivized to read political articles of extreme opposing
views, then they rely on participants’ survey self-reports and their behavioral
browsing data to track over time changes in online exposure and attitudes. Guess
et al. (2021) incentivized participants to change their browser default settings and
social media following patterns to enhance the likelihood of encountering partisan
news. This research design incorporated a naturalistic nudge within an online panel
survey with linked digital trace data, providing significant insights into the long-term
consequences of heightened exposure to partisan information. Levy (2021) recruited
participants using Facebook ads, asking them to subscribe conservative or liberal
news outlets on Facebook. Together with behavioral tracking data, such as sharing
posts and liking pages, this study demonstrates how news exposure and the algo-
rithms of social media platforms affect users’ behaviors and attitudes.
Moreover, digital experiments have enriched the treatment strategy for investi-
gating the mechanism of opinion polarization from diverse perspectives. For
112 Z. Lyu et al.
7.5 Discussion
References
Bail, C. A., et al. (2018). Exposure to opposing views on social media can increase political
polarization. Proceedings of the National Academy of Sciences, 115, 9216–9221.
Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and
opinion on Facebook. Science, 348(6239), 1130–1132. https://doi.org/10.1126/science.aaa1160
Barberá, P. (2015). Birds of the same feather tweet together: Bayesian ideal point estimation using
Twitter data. Political Analysis, 23(1), 76–91. https://doi.org/10.1093/pan/mpu011
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right:
Is online political communication more than an echo chamber? Psychological Science, 26(10),
1531–1542. https://doi.org/10.1177/0956797615594620
Bond, R., & Messing, S. (2015). Quantifying social media’s political space: Estimating ideology
from publicly revealed preferences on Facebook. American Political Science Review, 109(1),
62–78. https://doi.org/10.1017/s0003055414000525
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social
sciences. Science, 323(5916), 892–895. https://doi.org/10.1126/science.1165821
Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Bavel, J. J. V. (2017). Emotion shapes the
diffusion of moralized content in social networks. Proceedings of the National Academy of
Sciences, 114(28), 7313–7318. https://doi.org/10.1073/pnas.1618923114
114 Z. Lyu et al.
Casas, A., Menchen-Trevino, E., & Wojcieszak, M. (2023). Exposure to extremely partisan
news from the other political side shows scarce boomerang effects. Political Behavior, 45,
1491–1530.
Chen, W., Pacheco, D., Yang, K.-C., & Menczer, F. (2021). Neutral bots probe political bias on
social media. Nature Communications, 12, 5580.
Conover, M., Ratkiewicz, J., Francisco, M., Goncalves, B., Menczer, F., & Flammini, A. (2011).
Political polarization on Twitter. Proceedings of the International AAAI Conference on Web and
Social Media, 5, 89–96. https://doi.org/10.1609/icwsm.v5i1.14126
Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and
sociology. Annual Review of Sociology, 46(1), 61–81. https://doi.org/10.1146/annurev-soc-
121919-054621
Garrett, R. K. (2009). Echo chambers online? Politically motivated selective exposure among
Internet news users. Journal of Computer-Mediated Communication, 14(2), 265–285. https://
doi.org/10.1111/j.1083-6101.2009.01440.x
Garrett, R. K., Long, J. A., & Jeong, M. S. (2019). From partisan media to misperception: Affective
polarization as mediator. Journal of Communication, 69(5), 490–512. https://doi.org/10.1093/
joc/jqz028
González-Bailón, S., & Lelkes, Y. (2023). Do social media undermine social cohesion? A critical
review. Social Issues and Policy Review, 17, 155–180. https://doi.org/10.1111/sipr.12091
Guess, A. M., Barberá, P., Munzert, S., & Yang, J. (2021). The consequences of online partisan
media. Proceedings of the National Academy of Sciences, 118(14), e2013464118. https://doi.
org/10.1073/pnas.2013464118
Hetherington, M. J. (2001). Resurgent mass partisanship: The role of elite polarization. The
American Political Science Review, 95(3), 619.
Hofman, J. M. (2021). Integrating explanation and prediction in computational social science.
Nature, 595(7866), 181–188. https://doi.org/10.1038/s41586-021-03659-0
Iyengar, S., Sood, G., & Lelkes, Y. (2012). Affect, not ideology: A social identity perspective on
polarization. Public Opinion Quarterly, 76(3), 405–431. https://doi.org/10.1093/poq/nfs038
Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The origins and
consequences of affective polarization in the United States. Annual Review of Political Science,
22(1), 129–146. https://doi.org/10.1146/annurev-polisci-051117-073034
Jiang, J., Ren, X., & Ferrara, E. (2022). Retweet-BERT: Political leaning detection using language
features and information diffusion on social networks. https://doi.org/10.48550/arxiv.2207.
08349
Jungherr, A., Rivero, G., & Gayo-Avello, D. (2020). Retooling politics: How digital media are
shaping democracy. Cambridge University Press. https://doi.org/10.1017/9781108297820
Laver, M., & Garry, J. (2000). Estimating policy positions from political texts. American Journal of
Political Science, 44(3), 619. https://doi.org/10.2307/2669268
Laver, M., Benoit, K., & Garry, J. (2003). Extracting policy positions from political texts using
words as data. American Political Science Review, 97(2), 311–331. https://doi.org/10.1017/
s0003055403000698
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contrac-
tor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Alstyne, M. V.
(2009). Computational social science. Science, 323(5915), 721–723. https://doi.org/10.1126/
science.1167742
Levy, R. (2021). Social media, news consumption, and polarization: Evidence from a field
experiment. American Economic Review, 111(3), 831–870. https://doi.org/10.1257/aer.
20191777
Marchal, N. (2022). Be nice or leave me alone: An intergroup perspective on affective polarization
in online political discussions. Commun Res, 49, 376–398.
McCarty, N., Poole, K. T., & Rosenthal, H. (2006). Polarized America: The dance of ideology and
unequal riches. MIT Press.
7 Polarization of Opinion 115
McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social
networks. Annual Review of Sociology, 27(1), 415–444. https://doi.org/10.1146/annurev.soc.27.
1.415
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations
of words and phrases and their compositionality. Advances in Neural Information Processing
Systems, 26, 1081–1088.
Mosleh, M., Martel, C., Eckles, D., & Rand, D. G. (2021). Shared partisanship dramatically
increases social tie formation in a Twitter field experiment. Proceedings of the National
Academy of Sciences of the United States of America, 118(7), 9–11. https://doi.org/10.1073/
pnas.2022761118
Mosleh, M., Pennycook, G., & Rand, D. G. (2022). Field experiments on social media. Current
Directions in Psychological Science, 31(1), 69–75. https://doi.org/10.1177/
09637214211054761
Peng, T.-Q., Liang, H., & Zhu, J. J. H. (2019). Introducing computational social science for Asia-
Pacific communication research. Asian Journal of Communication, 29(3), 205–216. https://doi.
org/10.1080/01292986.2019.1602911
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation.
In Proceedings of the 2014 conference on empirical methods in natural language processing
(EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/d14-1162.
Preotiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political
ideology prediction of Twitter users. In Proceedings of the 55th annual meeting of the
Association for Computational Linguistics (Volume 1: Long papers) (pp. 729–740). https://
doi.org/10.18653/v1/p17-1068
Prior, M. (2013). Media and political polarization. Annual Review of Political Science, 16(1),
101–127. https://doi.org/10.1146/annurev-polisci-100711-135242
Rathje, S., Bavel, J. J. V., & van der Linden, S. (2021). Out-group animosity drives engagement on
social media. Proceedings of the National Academy of Sciences, 118(26), e2024292118. https://
doi.org/10.1073/pnas.2024292118
Rheault, L., & Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in
parliamentary corpora. Political Analysis, 28(1), 112–133. https://doi.org/10.1017/pan.2019.26
Rogowski, J. C., & Sutherland, J. L. (2016). How ideology fuels affective polarization. Political
Behavior, 38(2), 485–508. https://doi.org/10.1007/s11109-015-9323-7
Stanley, M. L., Henne, P., Yang, B. W., & Brigard, F. D. (2020). Resistance to position change,
motivated reasoning, and polarization. Political Behavior, 42(3), 891–913. https://doi.org/10.
1007/s11109-019-09526-z
Stier, S., Breuer, J., Siegers, P., & Thorson, K. (2020). Integrating survey data and digital trace data:
Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516.
https://doi.org/10.1177/0894439319843669
Stroud, N. J. (2010). Polarization and partisan selective exposure. Journal of Communication,
60(3), 556–576. https://doi.org/10.1111/j.1460-2466.2010.01497.x
Sunstein, C. R. (2002). The law of group polarization. Journal of Political Philosophy, 10(2),
175–195. https://doi.org/10.1111/1467-9760.00148
Tajfel, H. (1982). Social psychology of intergroup relations. Annual Review of Psychology, 33(1),
1–39. https://doi.org/10.1146/annurev.ps.33.020182.000245
Watts, D. J. (2004). The “new” science of networks. Annual Review of Sociology, 30(1), 243–270.
https://doi.org/10.1146/annurev.soc.30.020404.104342
Watts, D. J. (2017). Should social science be more solution-oriented? Nature Human Behaviour,
1(1), 0015. https://doi.org/10.1038/s41562-016-0015
Wilkerson, J., & Casas, A. (2017). Large-scale computerized text analysis in political science:
Opportunities and challenges. Annual Review of Political Science, 20(1), 529–544. https://doi.
org/10.1146/annurev-polisci-052615-025542
116 Z. Lyu et al.
Wojcieszak, M. (2010). ‘Don’t talk to me’: Effects of ideologically homogeneous online groups and
politically dissimilar offline ties on extremism. New Media & Society, 12(4), 637–655. https://
doi.org/10.1177/1461444809342775
Wong, F. M. F., Tan, C. W., Sen, S., & Chiang, M. (2016). Quantifying political leaning from
tweets, retweets, and retweeters. IEEE Transactions on Knowledge and Data Engineering,
28(8), 2158–2172. https://doi.org/10.1109/tkde.2016.2553667
Xiao, Z., Song, W., Xu, H., Ren, Z., & Sun, Y. (2020). TIMME: Twitter ideology-detection via
multi-task multi-relational embedding. In Proceedings of the 26th ACM SIGKDD international
conference on knowledge discovery & data mining (pp. 2258–2268). https://doi.org/10.1145/
3394486.3403275
Zhang, C., Song, D., Huang, C., Swami, A., & Chawla, N. V. (2019). Heterogeneous graph neural
network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge
discovery & Data Mining (pp. 793–803). https://doi.org/10.1145/3292500.3330961.
Chapter 8
Coda
Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 117
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_8
118 Y. Sato and H. Takikawa
Chapter 3 argues that computational social science is free from the deductive
approach. This is an important point in the scientific methodology. In conventional
scientific activities including sociology, scientists start with a theory, derive hypoth-
eses from it, set up an experiment or conduct a social survey to check their empirical
validity. If their empirical validity is confirmed, the theory is thought to survive the
empirical test. If not, scientists revise the theory or invent a new theory to make it
more empirically valid. Thus, scientific activities gradually advance scientific
knowledge (see Popper, 1959). This is a typical image of the deductive approach.
Some of computational social science can also follow the deductive approach.
Big data analysis, for example, is useful to check the empirical validity of hypotheses
derived from a theory. However, computational social science could go beyond it
and contribute to creating a new theory in sociology.
It is true that computational social science provided social scientists with social
telescopes (Golder & Macy, 2014) with much higher resolution than conventional
sociological research. However, from the viewpoint of the deductive approach, we
need a theory before observing social universe. Without a theory we could not know
which area in social universe we should observe. For example, it does not make
sense to only observe mobility patterns collected via smart phones after the lock-
down caused by COVID-19. This is because it is unclear why we need to observe the
patterns. However, if we have a theory of differential effects of the lockdown on
people of different class and ethnicity, we will combine data of such social charac-
teristics and mobility patterns and analyze how the lockdown increased or decreased
social inequality (see Chang et al., 2021). In a sense, a sociological study using
computational social science should start with a theory if it follows the deductive
approach.
However, computational social science could contribute to the advancement of
sociology without the deductive approach as Chap. 3 argues. Big data enables social
telescopes to cover much larger areas than conventional sociological data. Thus,
computational social science has possibilities to find a new theory by searching a
wide range of social universe.
Machine learning, which is focused on in Chap. 3, has this potential. For
example, unsupervised machine learning can extract the latent structure of observed
data, which could not be extracted by researchers analyzing the data. This means that
unsupervised machine learning has potential to find new patterns that might lead to a
8 Coda 119
new theory. Being free from the deductive approach, machine learning,
unsupervised machine learning in particular, opened a door to a new way of
exploring new theories in sociology.
Machine learning has another advantage. Because it is more flexible about
modeling than conventional statistical models, machine learning proposes better
predictions. When we use conventional statistical models such as regression models,
we derive hypotheses from a theory, build models based on them, and estimate the
models’ coefficients using data. In other words, the theory used for the modeling
limits the scope of social universe the social telescope observes, so the modeling
might miss the possibility of finding a new theory that exists outside of the scope. In
contrast, as abovementioned, machine learning is free from the deductive approach,
so it can search social universe for models that have more predictive power.
One caveat should be mentioned, however. Machine learning does not tell us how
the selected model should be interpreted. This is different from conventional statis-
tical models. For example, we understand how a regression model estimates its
coefficients using data. We can easily understand what the model means based on the
interpretation of coefficients. This is generally not the case when we use machine
learning model. The model is so complex that it is extremely difficult to grasp the
gist of the model. In other words, the selected model is in a black box.
To solve this problem, a method called scientific regret minimization method is
proposed. This method compares a black-box model obtained by machine learning
and a model that is interpretable such as a regression model. If the gap between them
is large, the latter is improved to make the gap smaller. And, finally, we find an
interpretable model whose gap with the black-box model is the smallest. At the end
of this process, we get an interpretable model with strong predictive power.
attacking us and predict that he/she will stab us. Based on such predictions, we can
take a proper action such as not smoking in the room and running away from the
stranger.
Relationship is the foundation of prediction. Meaning relates a thing and an event
to other things and events, which is a basic characteristic of prediction. We relate a
non-smoking sign to predictions that nobody smokes in the room and that we would
be punished if we smoked. We relate a stranger approaching us with a knife to a
prediction that he/she will stab us. In other words, a thing or an event does not have
meaning if it is detached from other things or events. Meaning of a thing or an event
exists in the network of the thing or the event with other things and events.
Semantic learning is a process that we correct the estimation of the relationship.
To cite an example in Chap. 5, suppose that we see the color of a mushroom, think
that the mushroom is edible, and eat it. If the mushroom is poisonous and we have a
stomachache, we learn that the relationship between the color of the mushroom and
its edibility is wrong and update the relationship with the fact that it was poisonous.
Do topic models and word-embedding models capture these points so that they
would analyze meaning-making from a different viewpoint from that of conven-
tional sociology? Chapter 5 gives a positive answer to this question.
Topic models extract latent topics from observed sentences and relate
(1) sentences and topics and (2) words in the sentences and topics with probabilities.
These characteristics of topic models uncover meaning of a word. As pointed out
above, a word does not have a meaning by itself. It has a meaning only if it is related
to other words. To cite an example in Chap. 5, a “tie” means a draw if it is strongly
related to a topic “sports” and appears with another word “game” also being related
to the topic “sports.” In contrast, a “tie” means a necktie if it is related to a topic
“fashion” and appears with another word “suits” also being related to the topic
“fashion.” Thus, topic models unveil relationality of meaning of a word and poly-
semy of a word. In the above example, a “tie” has two meanings depending on which
words are used with it. Therefore, topic models are suitable for capturing polysemy.
Topic models use Bayesian updating methods to estimate their parameters. This is
similar to the abovementioned semantic learning. Bayesian updating methods update
parameters, which are called posterior parameters, using prior parameters and new
information. In the case of the abovementioned mushroom example, we had a prior
belief (parameter) about the relationship between the color of the mushroom and its
edibility. Then we got new information that the mushroom was poisonous, and that
we had a stomachache. Using this information, we revise the prior belief and get a
posterior belief (parameter). Of course, we do not rigorously use Bayesian updating
methods in everyday life, but we conduct a kind of updating process of parameters.
It is obvious that the logic of topic models and people’s meaning-making are
different from each other, but we think that we can get useful tips to understand
people’s meaning-making by deeply understanding the logic of topic models. This
way of using topic models is completely different from their conventional use, but
this is a way to apply topic models to meaning and interpretation, major research
topics in sociology.
8 Coda 121
Chapters in the book including Chaps. 3 and 5 have shown that computational social
science has the potential to advance sociological research. We have also emphasized
the importance of meaning and interpretation for computational social science to
substantively contribute to the advancement of sociology and shown examples of
such practices.
Sociology should also change itself to promote collaboration with computational
social science. It has evolved by the combination of sociological theories and
empirical studies checking their empirical validity. Conventionally, most of the
empirical studies are case studies and statistical analysis of survey data. Both of
them are strong tools for the development of new sociological theories. Take study
of inequality of educational opportunity, for example. Reduction in the inequality
has been an important topic in society. This is because modern societies assume that
122 Y. Sato and H. Takikawa
equality of educational opportunity is an ideal of them. The ideal tells that, in modern
society, anybody should equally get access to education, higher education in partic-
ular, no matter what family he/she comes from. Therefore, it has been a central topic
in the study of social inequality to empirically clarify the degree of the inequality.
A study of inequality of educational opportunity in Ireland by Raftery and Hout
(1993) is in line with this research tradition. They conducted statistical analysis of
data on transition from elementary education to secondary education and from
secondary education to higher education. The Irish government conducted reforms
of secondary education in 1967. For example, tuition fees became free, and free
school transportation was provided. These reforms resulted in the increase in the
overall participation rate in secondary education. However, it is another story
whether class differentials of the rate were reduced. Intuitively, it is plausible that
the reforms increased the opportunity for children from lower classes to enter
secondary education, and, therefore, the class differentials decreased. However,
this intuitive story should be empirically examined.
For this examination Raftery and Hout (1993) analyzed effects of the reforms on
inequality of educational opportunity by birth cohorts and social classes. To sum-
marize their findings, the reforms did not reduce the inequality. As for the opportu-
nity to enter secondary education, the class differentials reduced for young cohort.
This is an effect of the reform. However, as for the opportunity to complete
secondary education and to enter higher education, the differentials did not change.
Raftery and Hout (1993) generalized these findings to propose a hypothesis of
maximally maintained inequality, which has been cited uncountably. The hypothesis
is as follows (Hout, 2006; Raftery & Hout, 1993):
1. Even if educational expansion occurs by the educational reform, class differen-
tials of educational opportunity do not change. It is the case that the transition
rates of students from lower classes increase, but the rates of students from any
classes increase in parallel.
2. Then, if the demand for better education of students from upper classes is
saturated, more educational opportunities become open to students from lower
and middle classes. This results in reducing class differentials.
3. Conversely, if educational opportunities shrink, inequality of educational oppor-
tunity increases.
After proposing the hypothesis, Raftery and Hout (1993) proposed a rational
choice model to explain it. To cite it simply, students and their families decide
whether to move up to higher education or not by calculating costs and benefits
associated with the education. Upper-class students and their families estimate that
benefits are larger than costs, so they decide to continue education. In contrast their
lower-class counterparts estimate them in the opposite way, so they tend not to
continue education. Only if educational opportunity expands and the transition rate
of upper-class students saturates, lower-class students and their family begin to
estimate that the benefits exceed the costs and decide to continue education. This
results in the decrease in class differentials.
8 Coda 123
References
Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the
industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In
G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems
(pp. 48–69). Springer.
Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro
technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial
Societies and Social Simulation, 7(2), 1.
Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties:
An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174.
Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton Univer-
sity Press.
Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113(3), 702–734.
Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021).
Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589,
82–87.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41, 570–606.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Hout, M. (2006). Maximally maintained inequality and essentially maintained inequality:
Crossnational comparisons. Sociological Theory and Methods, 21(2), 237–252.
Popper, K. R. (1959). The logic of scientific discovery. Hutchinson.
Raftery, A. E., & Hout, M. (1993). Maximally maintained inequality: Expansion, reform, and
opportunity in Irish education, 1921–75. Sociology of Education, 66(1), 41–62.
Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social
theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.),
Reconstruction of the Public Sphere in the Socially Mediated Age (pp. 37–46). Springer Nature
Singapore Pte.
Ueshima, A., & Takikawa, H. (2021). Analyzing vaccination priority judgment for 132 occupations
using word vector models. In WI-IAT ‘21: IEEE/WIC/ACM International conference on web
intelligence and intelligent agent technology (pp. 76–82).