0% found this document useful (0 votes)
10 views130 pages

Sociological Foundations of Computational Social Science: Yoshimichi Sato Hiroki Takikawa

Uploaded by

Jose Zawadsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views130 pages

Sociological Foundations of Computational Social Science: Yoshimichi Sato Hiroki Takikawa

Uploaded by

Jose Zawadsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

Translational Systems Sciences 40

Yoshimichi Sato
Hiroki Takikawa Editors

Sociological
Foundations
of Computational
Social Science
Translational Systems Sciences

Volume 40

Editor-in-Chief
Kyoichi Kijima, School of Business Management, Bandung Institute of Technology,
Tokyo, Japan
Hiroshi Deguchi, Faculty of Commerce and Economics, Chiba University of
Commerce, Tokyo, Japan
Editors in Chief
***
• Kyoichi Kijima (Bandung institute of Technology)
• Hiroshi Deguchi (Chiba University of Commerce)
Editorial Board
• Shingo Takahashi (Waseda University)
• Hajime Kita (Kyoto University)
• Toshiyuki Kaneda (Nagoya Institute of Technology)
• Akira Tokuyasu (Hosei University)
• Koichiro Hioki (Shujitsu University)
• Yuji Aruka (Chuo University)
• Kenneth Bausch (Institute for 21st Century Agoras)
• Jim Spohrer (IBM Almaden Research Center)
• Wolfgang Hofkirchner (Vienna University of Technology)
• John Pourdehnad (University of Pennsylvania)
• Mike C. Jackson (University of Hull)
• Gary S. Metcalf (InterConnections, LLC)
• Marja Toivonen (VTT Technical Research Centre of Finland)
• Sachihiko Harashina(Chiba University of Commerce)
• Keiko Yamaki(Shujitsu University)
Yoshimichi Sato • Hiroki Takikawa
Editors

Sociological Foundations
of Computational Social
Science
Editors
Yoshimichi Sato Hiroki Takikawa
Faculty of Humanities Graduate School of Humanities and Sociology
Kyoto University of Advanced Science The University of Tokyo
Kyoto, Japan Bunkyo-ku, Tokyo, Japan

ISSN 2197-8832 ISSN 2197-8840 (electronic)


Translational Systems Sciences
ISBN 978-981-99-9431-1 ISBN 978-981-99-9432-8 (eBook)
https://doi.org/10.1007/978-981-99-9432-8

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore
Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore

Paper in this product is recyclable.


Preface

Computational social science has not fully shown its power in sociology. This is the
motivation for us to publish this book. It is true that intriguing articles using
computational social science have been published in top journals of sociology, but
it is another story whether computational social science has a dominant influence in
sociology. To the best of our knowledge, it has not occupied a central position in
sociology yet. Why not? We tried to answer this question in this book.
Our answer is that computational social science has not attacked central issues in
sociology, meaning and interpretation in particular. If it gave answers to research
questions, which most sociologists deem important but conventional sociological
methods such as statistical analysis of social survey data have been unable to answer,
computational social science would be more influential in sociology. However, this
has not happened yet.
This issue stems from two reasons: First, computational social scientists are not
necessarily familiar with important sociological concepts and theories, so they tend
to begin with available digital data such as mobile data without seriously thinking
how analyzing the data contributes to the advancement of the concepts and theories.
Second, sociologists have not clearly formalized important concepts and theories so
that computational social scientists would find it easy to connect them with their
analysis and be able to substantively contribute to elaborating them.
In a sense, computational social science and sociology are in an unhappy relation.
If they efficiently collaborate with each other, sociology will reach a higher level.
This book proposes ways to promote the collaboration. Especially, we focus on
meaning and interpretation in several chapters and show how to incorporate them in
analysis using computational social science. This is because, as abovementioned,
they have been central issues in sociology. Thus, if they are properly incorporated in
computational social scientific analysis, the result of the analysis makes a quantum
leap in sociology, and sociologists realize the true power of computational social
science. As a result, computational social science and sociology will have a happy
marriage.

v
vi Preface

We hope that readers of the book find it important to realize the collaboration
between computational social science and sociology by ways proposed in the book
for sociology to jump to a higher stage with the help of computational social science.

Kyoto, Japan Yoshimichi Sato


Bunkyo-ku, Tokyo, Japan Hiroki Takikawa
Acknowledgement

This work was supported by JSPS KAKENHI Grand Number 21K18448. We


appreciate the generous grant from Japan Society for the Promotion of Science.
We also deeply appreciate Hiroshi Deguchi for his encouragement to publish
this book.

vii
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Yoshimichi Sato and Hiroki Takikawa
2 Sociological Foundations of Computational Social Science . . . . . . . . 11
Yoshimichi Sato
3 Methodological Contributions of Computational Social Science to
Sociology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Hiroki Takikawa and Sho Fujihara
4 Computational Social Science: A Complex Contagion . . . . . . . . . . . . 53
Michael W. Macy
5 Model of Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Hiroki Takikawa and Atsushi Ueshima
6 Sociological Meaning of Contagion . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Yoshimichi Sato
7 Polarization of Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa
8 Coda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Yoshimichi Sato and Hiroki Takikawa

ix
Chapter 1
Introduction

Yoshimichi Sato and Hiroki Takikawa

1.1 Backdrop and Purpose of the Book

Computational social science opened a new door to advancing social scientific


studies in two ways. First, social scientists became able to conduct rigorous thought
experiments using the technique of agent-based modeling, one of the two main
pillars of computational social science. Agent-based modeling creates many actors,
who are called agents in the technique, and they are assumed to decide, act, interact
with other agents, and learn from their own experience and vicariously (Macy &
Willer, 2002). What is most important in agent-based modeling is that experi-
menters, that is model builders, do not order agents to behave in a certain way.
Rather, agents voluntarily decide, act, interact, and learn. As a result of such
voluntary behaviors, a social phenomenon emerges. What experimenters do is to
set up an external environment in which agents behave and to observe which
external environment creates which social phenomenon via interactions of agents.
Because external environments and social phenomena exist at the macro-level and
agents exist at the micro-level, agent-based modeling is an excellent tool by which
social scientists precisely study the micro-macro linkage (Coleman, 1990).
Macy and Sato (2002), for example, built an agent-based model to study why and
how trust and cooperation propagate in society and the global market emerges. They
assumed that the level of mobility of agents among local societies such as villages

Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_1
2 Y. Sato and H. Takikawa

has an inverted U-shaped effect on the emergence of trust, cooperation, and the
global market. To check the validity of their assumption, they set up a model in
which some of the agents are assumed to move among local societies, become
newcomers, and interact with local people. Then, local people learn to trust and
cooperate with the newcomers and decide to leave their local societies entering the
global market. If the level of mobility is very low, local people do not have an
opportunity to interact with newcomers and, therefore, cannot learn to trust and
cooperate with them. If the level of mobility is very high, local societies become
unstable because most of the agents leave their local societies and become new-
comers, local people lose stable local societies in which they learn to trust and
cooperate with newcomers. Only if the level of mobility is moderate, local people
learn to trust and cooperate with newcomers and enter the global market, which leads
to the propagation of trust and cooperation in society and the emergence of the global
market.
In their model, the external environment is the level of mobility of agents, and the
social phenomena to be explained are the propagation of trust and cooperation and
the emergence of the global market at the societal level. Agents decide whether to
trust and cooperate with other agents and to enter the global market at the micro-
level. Then, their trusting behavior, cooperating behavior, and entering the global
market accumulate to the macro-level. As the result of the accumulation, trust and
cooperation propagate at the macro-level, and the global market emerges. In this
sense, Macy and Sato’s (2002) study is a textbook example of the micro–macro
linkage.
Big data (or digital data) analysis is the other main pillar of computational social
science.1 This technique has radically changed the way social scientific studies are
conducted. In conventional social scientific studies using empirical data, researchers
conduct social surveys to collect data they need for their studies. Social surveys have
various types: interview surveys, mail surveys, web surveys. What is common in any
type is that researchers design questionnaires to collect necessary information on
randomly selected respondents. For example, if a researcher wants to know the
relationship between social class and life satisfaction, he or she includes questions on
respondents’ occupation, education, income, and life satisfaction in the question-
naire and check whether positive relationship exists between class variables (occu-
pation, education, and income) and life satisfaction using correlation analysis and
cross-tabulation analysis. In a sense, social scientists actively collect necessary
information on respondents in social surveys.
In contrast, social scientists using big data collect information with limited
availability in most cases.2 For example, if we use mobility data of smartphone
users before and after a lockdown caused by the pandemic of COVID-19, we know
how the lockdown changed mobility patters of the users. However, because we

1
Big data and digital trace data are used interchangeably in this chapter.
2
Online experiments can collect necessary information on participants. Thus, the discussion here is
not applicable to online experiments.
1 Introduction 3

cannot collect the users’ opinions about the lockdown from the mobility data, we do
not know whether their attitudes toward the lockdown affect their mobility patterns.
Furthermore, because we cannot collect information on social classes of the users
from the mobility data, we do not know differential effects of the lockdown on the
users by social class.3
This incompleteness of available data (Salganik, 2018: Chapter 2) is one of the
serious problems of big data compared with social survey data. However, big data
has advantages compensating for this disadvantage. As Salganik (2018: Chapter 2)
points out, big data is literally big, always-on, and nonreactive. The bigness of big
data has several advantages. Comparing with social survey data, the most important
advantage is that results of analysis are robust and stable even if the data is divided
into many categories because each category has enough samples for analysis. For
example, when we create multiway tables using a social survey dataset, it often
happens that many cells do not have enough samples for robust analysis. Big data
analysis is exempt from this problem because we can collect as many samples as
possible. Theoretically, of course, there is an upper limit of the size of big data, but
our usual use of big data does not face this problem.
Always-on is also a strong advantage of big data. When social scientists want to
collect information on temporal change of characteristics of respondents such as
income, marital status, and life satisfaction using social surveys, they conduct
longitudinal data analysis. In the analysis they usually collect data on a regular
base, for example, every month or every year. Thus, in a sense, social survey data is
sporadically on, so we do not know temporal change of characteristics of respondent
between two survey waves. In contrast, big data such as mobile data and Twitter data
streams without break, so we can collect continuous time-series data on our target
population.
Nonreactivity of big data means that collecting big data does not affect behaviors
of the target population. For example, even if they know that their mobile data is
collected by their mobile phone company, they do not change their behaviors. They
commute to their firms, go to restaurants and bars, and exercise at fitness clubs as
usual. They tweet what they want to tweet. In contrast, respondents in social surveys
tend to give socially desirable answers to questions asked by interviewers in an
interview survey. For example, if a male respondent who supports sexual division of
labor is asked whether he supports it or not by a female interviewer, he probably
answers that he does not support it. This social desirability distorts distributions of
variables used in the social survey, but big data does not suffer from this problem.
So far, we observed advantageous characteristics of computational social science
focusing on agent-based modeling and big data analysis. Because of the character-
istics, computational social science has become influential and indispensable in
social science including sociology. However, it is another story whether it has

3
The study of the spread of the corona virus by Chang et al. (2021) is innovative because they
succeeded in combining mobility data and data on socioeconomic status and race to predict that the
infection rates are higher among disadvantaged racial and socioeconomic groups.
4 Y. Sato and H. Takikawa

answered core research questions in sociology. We argue that it has not necessarily
answered them yet because most of the studies using it do not properly include
meaning and interpretation of actors in their analysis. Meaning and interpretation
have been seriously studied by major sociologists since the early days of sociology.
For example, Max Weber, a founding father of sociology, argued that social action is
different from behavior (Weber, 1921–22). According to his conceptualization, a
behavior becomes a social action if the former has a meaning. In other words, a
behavior has to be interpreted and given a subjective meaning by the actor and other
actors (and the observer, that is, a sociologist observing the behavior) in order to
become a social action.
Since Max Weber, many giants in sociology focused on meaning and interpreta-
tion as the main theme of their sociological studies. They are Alfred Schütz (1932),
Peter Berger and Thomas Luckmann (1966), Herbert Blumer (1969), Erving
Goffman (1959), and George Herbert Mead (1934), to name a few. Their sociolog-
ical studies have attracted uncountable sociologists in the world. Thus, if it properly
deals with meaning and interpretation, computational social science will enter the
core in sociology and truly become indispensable to sociology. Furthermore, includ-
ing meaning and interpretation in computational social science will enhance its
analytical power. The purpose of this book exists here. That is, we try to explore
how computational social science and sociology should collaborate to advance
sociological studies as well as computational social science. Each chapter in this
book implicitly or explicitly shares this purpose from different perspectives.

1.2 Organization of the Book

Chapter 2 “Sociological Foundations of Computational Social Science” by


Yoshimichi Sato is literally a theoretical part of this book. He highly evaluates the
strength of computational social science consisting of big data (or digital trace), data
analysis, and agent-based modeling. He argues that the three characteristics of big
data pointed out by Salganik (2018), that is, bigness, always-on, and nonreactivity,
allow sociologists to study social phenomena that were unable to study by conven-
tional social surveys. Agent-based modeling also opened a door to studying micro–
macro linkages from new perspectives. It rigorously explains social phenomena as
the accumulation of behaviors of many agents. This was impossible by conventional
sociological approaches.
However, computational social science is not exempt from some problems when
it is applied to core research questions in sociology. Most of the studies of compu-
tational social science deal with behaviors of people and miss meaning and inter-
pretation, while sociology has emphasized the importance of the two concepts. Thus,
it is important to incorporate the two concepts in computational social science for it
to be more influential in sociology. Sato examines Goldberg and Stein (2018) who,
following Berger and Luckmann (1966), incorporate meaning and interpretation in
their agent-based model and evaluates that Goldberg and Stein (2018) are an
1 Introduction 5

excellent example of collaboration between agent-based modeling and social theory.


He also examines other works in line with ideas of Goldberg and Stein (2018) and
concludes that sociologists should start their study with sociological theories and
concepts, derive hypothesis from them, and apply techniques of computational
social science to them to check their empirical validity.
In Chap. 3 “Methodological Contributions of Computational Social Science to
Sociology” by Hiroki Takikawa and Sho Fujihara, the authors discuss how machine
learning, one of the central methods of computational social science, can be used to
advance sociological theory. The authors describe machine learning as (1) learning
from data and experience, i.e., data-driven, (2) aiming to improve the performance of
task resolution, such as prediction and classification, and (3) aiming to develop
algorithms that automate task resolution. Then, the potential theoretical contribu-
tions of machine learning in sociology are discussed, including breaking away from
deductive models, establishing a normative cognitive paradigm, and using machine
learning as a model of human decision making and cognition. The chapter thereafter
identifies problems with conventional quantitative methods in sociology and dis-
cusses the predictive framework and automatic coding as examples of applications
of machine learning in sociology, and how sociological theory can be improved
through these applications. It is also pointed out that machine learning can shed new
light on the elucidation of causal mechanisms, which is a central issue in sociological
theory. Finally, it is argued that the Scientific Regret Minimization Method could be
an important breakthrough in addressing the “interpretability” problem, which is a
major challenge for machine learning. Thus, Chap. 3 shows that the common belief
that machine learning is atheoretical and therefore does not contribute to the theo-
retical development of sociology is false, and that with appropriate and creative
applications, machine learning, a central method in computational social science, has
great potential for improving sociological theory.
Michael Macy, the author of Chap. 4 “Computational Social Science: A Complex
Contagion” looks back over his personal history as a computer social scientist and
overlaps it with the development of computational social science. According to him,
the first wave of computational social science explored implications derived from
theories in social science. Macy (1989) was a pioneering work to understand how
players learn to cooperate in the prisoner’s dilemma game. His computer simulation
opened the door to a new research area, which will be called agent-based modeling
later. Then he attacked how trust and cooperation diffuse among strangers by
building agent-based models (Macy & Sato, 2002; Macy & Skvoretz, 1998).
Then, Macy moved on to the study of diffusion or contagion on social networks.
Inspired by the small world study by Watts and Strogatz (1998), Centola and Macy
(2007) proposed a theory or model of “complex contagion.” The model assumes
that, to adopt a new cultural item, an individual needs to be exposed to more than one
individual with the item. This is a sociologically plausible assumption because an
individual hesitates to adopt a new cultural item such as participating in a risky social
movement. Thus, for him/her to participates in the movement, he/she needs more
than one prior adopter or wide bridges with them.
6 Y. Sato and H. Takikawa

Homophily is a critical factor for wide bridges to be built, but it may also create
polarization of beliefs, preferences, and opinions among people. People tend to be
clustered with those with similar beliefs, preferences, and opinions, which could lead
to polarization. To explore the mechanism creating polarization, Macy and his
collaborators conducted computer simulation using an agent-based model
(DellaPosta et al., 2015; Macy et al., 2021), laboratory experiments (Willer et al.,
2009), and online experiments (Macy et al., 2019) to study detailed mechanisms of
polarization.
Then, the second wave of computational social science came into the picture: Big
data analysis. Macy’s first study using big data was to examine the empirical validity
of Burt’s (1992) theory of structural holes at population level. Burt himself tested the
theory with data of entrepreneurs. In contrast, Macy and his collaborators used
telephone communication records in the UK to report that, as suggested by the
theory of structural holes, social network diversity calculated using the records is
strongly associated with socioeconomic advantages (Eagle et al., 2010). Another
study with big data analysis by Golder and Macy (2014) was to analyze tweets to
find what factors affect people’s emotion. Their study showed that the level of
happiness measured by positive and negative words is the highest when they wake
up, but the level declines as time passes. In other words, the level is affected by sleep
cycles.
The most excellent point of Macy’s study in computational social science is that
he starts his research with sociological theories and use techniques of computational
social science—agent-based modeling and big data analysis—to test them. That is,
theory comes first, and the techniques come second. Therefore, his research has
substantively advanced sociological theories.
Chapter 5 “Model of Meaning” by Hiroki Takikawa and Atsushi Ueshima
discusses the potential contribution of computational social science methods to
models of meaning and the elucidation of meaning-making mechanisms, which
are central issues in sociological theory. The authors point out that in conventional
sociology, issues of meaning have been examined almost exclusively through
qualitative approaches, and that the theoretical development of sociology as a
whole has been hindered by the lack of a quantitative model of meaning. In contrast,
the methods of computational social science, coupled with the availability of large-
scale textual data, have the potential to help build quantitative models of meaning.
With these prospects in mind, Takikawa and Ueshima firstly formulate meaning-
making as a computational problem and characterize it with three key points: the
predictive function of meaning, relationality of meaning, and semantic learning.
With this formulation, they discuss the utility of two computational linguistic
models—the topic model and the word-embedding model in terms of theories of
meaning. It is argued that the topic model and the word-embedding model have their
own advantages and disadvantages as models of meaning, and that integration of the
two is necessary. It is then pointed out that, in order to function as an effective model
of meaning for sociology, it is necessary to go beyond merely a computational
linguistics model for the representation of meaning and to clarify the mechanism that
links semantic representations to human action. They propose a model for this
1 Introduction 7

purpose. Thus, in Chap. 5, it is argued that the computational language model of


computational social science should not be interpreted merely as an analytical tool,
but as a model of human cognition and meaning-making, on the basis of which a
sociologically persuasive theory of meaning should be constructed.
Chapter 6 “Sociological Meaning of Contagion” by Yoshimichi Sato explores
ways to substantively combine sociological theories and big data analysis focusing
on contagion or diffusion, one of the main research themes in sociology. In the
beginning of the chapter, Sato cites Wu et al. (2020) to show that big data analysis is
powerful to nowcast and forecast of COVID-19. However, he also points out that a
contagion or diffusion of a virus is different from that of new ideas, values, and
norms. Thus, he is in line with Centola and Macy (2007) in that both stress the
difference. Then, he carefully reviews Centola and Macy (2007) and points out that
their theory—theory of complex contagions—does not fully explain a contagion
process. This is because the theory does not incorporate meaning and interpretation
in the process. Reviewing a failure case of diffusion in Rogers (2003), sociological
theories by Mead (1934) and Berger and Luckmann (1966), and the social mobili-
zation theory by Snow et al. (1986), Sato argues that placing meaning and interpre-
tation in the study of complex contagion would enhance its explanatory power.
Then, he examines Bail’s (2016) study of frame alignment strategies of organ
donation advocacy organizations by applying topic models to their Facebook mes-
sages and concludes that combining the theory of complex contagions and Bail’s
theory of cultural carrying capacity would give us a deeper comprehension of the
mechanism of complex contagion.
Chapter 7 “Polarization of Opinion” by Zeyu Lyu, Kikuko Nagayoshi, and Hiroki
Takikawa discusses how sociology based on computational social science methods
can contribute to today’s most pressing social problem of opinion polarization. The
authors conceptualize opinion polarization as increases in antagonistic and extreme
preferences over public policies, ideological orientations, partisan attachments, or
cultural norms. Then, they point out the rise of opinion polarization has attracted
extensive attention because its negative consequence may pose a disruptive threat to
democratic societies. According to the authors, the limitation of the traditional
quantitative social science research which has primarily relied on conventional
statistical analyses of survey data has become rather apparent in the investigation
of opinion polarization because of its complex mechanism and process. In the main
part, they review the new methodology for opinion polarization research by classi-
fying them into three parts: network analysis, natural language processing, and
digital experiment. They conclude with three points about contribution of sociolog-
ical theories of political polarization. First, new data and methods can help solve
numerous long-standing obstacles once considered insurmountable. Furthermore,
with new data and methods sociologists can pose new questions and formulate new
theories. Finally, new data and methods can transform the social science paradigm—
from the explanation of the current status to the prediction of the future.
Chapter 8 reviews discussions shown in previous chapters and propose two ways
for further collaboration between computational social science. The first way follows
the deductive approach. A sociologist should begin his/her research with a
8 Y. Sato and H. Takikawa

sociological theory, derive hypotheses from it, and check their empirical validity
using techniques of computational social science. The difference between this way
and conventional sociological research is that, in the former, computational social
science techniques can use data that conventional sociological methods were unable
to access and, therefore, analyze. The second way is that a sociologist should find
new patterns with the help of computational social science, generalize them to
hypothesis, and create a new theory to explain them. The key point of the second
way is that computational social science techniques find new patterns that could not
be found by conventional sociological methods, and such new patterns lead to new
theories. The chapter concludes that proper use of computational social science
opens a new door to upgrading sociology.

References

Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Berger, P. L., & Luckmann, T. A. (1966). The social construction of reality: A treatise in the
sociology of knowledge. Doubleday.
Blumer, H. G. (1969). Symbolic interactionism: Perspective and method. University of California
Press.
Burt, R. (1992). Structural holes: The social structure of competition. Harvard University Press.
Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113, 702–734.
Chang, S., et al. (2021). Mobility network models of COVID-19 explain inequities and inform
reopening. Nature, 589, 82–87.
Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press.
DellaPosta, D., Shi, Y., & Macy, M. (2015). Why do liberals drink lattes? American Journal of
Sociology, 120, 1473–1511.
Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science,
328, 1029–1031.
Goffman, E. (1959). The presentation of self in everyday life. Doubleday.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Macy, M. W. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s
dilemma. Rationality and Society, 1, 197–219.
Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Sciences of the United States of America, 99(Suppl_3),
7214–7220.
Macy, M. W., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A
computational model. American Sociological Review, 63, 638–660.
Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agent-
based modeling. Annual Review of Sociology, 28, 143–166.
Macy, M. W., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of
partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754
Macy, M. W., Ma, M., Tabin, D. R., Gao, J., & Szymanski, B. K. (2021). Polarization and tipping
points. Proceedings of the National Academy of Sciences, 118(50), e2102144118.
1 Introduction 9

Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University
of Chicago Press.
Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.
Salganik, M. J. (2018). Bit by bit. Princeton University Press.
Schütz, A. (1932). Der sinnhafte Aufbau der sozialen Welt: Eine Einleitung in die Verstehende
Soziologie. Springer.
Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment
processes, micromobilization, and movement participation. American Sociological Review,
51(4), 464–481.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Weber, M. (1921–22). Grundriß der Sozialökonomik, III. Abteilung, Wirtschaft und Gesellschaft,
Erster Teil, Kap. I. Verlag von J.C.B. Mohr.
Willer, R., Kuwabara, K., & Macy, M. W. (2009). The false enforcement of unpopular norms.
American Journal of Sociology, 115, 451–490.
Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic
and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling
study. Lancet, 395, 689–697.
Chapter 2
Sociological Foundations of Computational
Social Science

Yoshimichi Sato

2.1 Introduction: Computational Social Science


and Sociology

Computational social science consisting of digital (big) data analysis and agent-
based modeling has become popular and influential in social science. Take Chang
et al. (2021), for example. They analyzed mobile phone data to simulate geograph-
ical mobility of 98 million people. One of their major findings is that social
inequality affects the infection rate. Their model predicts “higher infection rates
among disadvantaged racial and socioeconomic groups solely as the result of
differences in mobility” (Chang et al., 2021, p. 82). This finding is important and
meaningful to sociologists because one of the most important research topics in
sociology, social inequality, is studied from a new perspective with the help of
computational social science.
This chapter shows the gap between sociology and computational social science
and how to fill the gap. Chang et al. (2021) give us a good clue for it. I will get back
to this point later.

2.2 Strength and Problems of Computational Social Science

Strength of computational social science could be summarized as follows. Digital


data analysis deals with data having three characteristics: “Big,” “Always-on,” and
“Nonreactive” (Salganik, 2018). Analysis of data with these characteristics can study

Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 11
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_2
12 Y. Sato

social phenomena that cannot be studied by analysis using conventional social


survey data that is small, always-off, and reactive. For example, Salganik et al.
(2006) created an artificial music market with about 14,000 participants to study the
social influence in cultural markets. Participants were recruited from a website and
randomly assigned to an independent condition without social influence and the
other with social influence. In the former condition participants decided which songs
they will listen to based only on the names of the bands and their songs and, while
listening to them, ranked them. In the latter condition, in addition to the former
condition, they could know the download counts of each song done by previous
participants. In the latter condition, participants were randomly assigned to one of
the eight artificial worlds so that observers could see how each world evolves
independently.
Salganik et al. (2006) report three major findings based on the online experiment.
First, inequality is larger in popularity among songs in the social influence condition
than in the independent condition. Second, the evolution of the inequality in
popularity is unpredictable. Even though the eight artificial worlds are under the
same conditions, the degrees of inequality are different across the eight worlds.
Third, the unpredictability is higher in the social influence condition than in the
independent condition.
These findings clearly show that popularity of songs, cultural items in general,
depends more on social influence than on their own quality and that the popularity
evolves unpredictably. In theory these findings could be obtained by a conventional
laboratory experiment. In practice, however, it is almost impossible to find about
14,000 people and ask them to come to a laboratory. The method developed by
Salganik and his colleagues shows the strength of computational social science.
Agent-based modeling, the other main pillar of computational social science, also
has its own strength (Cederman, 2005; Gilbert, 2019; Macy & Willer, 2002;
Squazzoni, 2012). The strongest point of agent-based modeling, I would argue, is
that it can clearly study the micro–macro linkage and the emergence of a social
phenomenon from interaction between agents. For example, Schelling’s model of
residential segregation, a prototype of agent-based modeling, made a simple assump-
tion about moving decision of individuals (agents) (Schelling, 1971). Agents are
assumed to have homophily tendency. If the rate of an agent’s neighbors in the
different characteristic, race, for example, from that of the agent, is smaller than
his/her threshold, he/she stays in the same place. If the rate is larger than the
threshold, he/she moves to a new vacant place. Then, after iterations, residential
segregation emerges at the societal level.
Schelling’s model clearly demonstrates the transition from the micro-level
(homophily principle at the agent’s level) to the macro-level (residential segregation
at the societal level). Other models deal with the macro–micro–macro transition.
Macy and Sato (2002), for example, study the effect of mobility to the emergence of
generalized trust and the global market. In the model agents are randomly allocated
to neighborhoods. Then agents are assumed to have propensities to trust, cooperate,
and enter the global market, that they make decisions comparing their propensities
and randomly generated thresholds, and that they revise the propensities by social
2 Sociological Foundations of Computational Social Science 13

learning. Then they manipulate the mobility rate to study its effect on the formation
of trust, cooperation, and the global market. The mobility rate at the macro-level
affects the decision-making and behavior of agents at the micro-level, and then their
behavior accumulates and determines the level of trust, cooperation, and the global
market at the macro-level.
So far, we observed the strength of computational social science. It provides
powerful tools to social scientists, with which they can study new aspects of social
phenomena that cannot be analyzed by conventional methods. However, computa-
tional social science, I would argue, finds it difficult to properly study central
questions in sociology such as the emergence of social order and inequality because
it does not include meaning in its models (Goldberg & Stein, 2018).
Take the emergence of social order, for example. Social order has been a central
research topic in sociology [e.g., Parsons (1951) and Seiyama (1995)]. Scholars in
this research area would agree that coordinated behaviors among actors is a neces-
sary condition for social order but not a sufficient condition. Coordinated behaviors
must be justified by actors’ belief about them. Through their belief system, actors
observing coordinated behavior of other actors interpret that the behavior is socially
desired and that he/she should also behave in the same way. Here, as Weber (1921)
points out, behavior turns to be social action.
The abovementioned model of trust by Macy and Sato (2002) studies the
emergence of social order in a sense. The high level of trusting and cooperating
behavior and the emergence of the global market seem to be an example of the
emergence of social order. However, they are coordinated behavior of agents. For
the behavior to become social action and for social order to emerge, agents must
interpret the coordinated behavior as just behavior. However, agents in the model are
not equipped with a belief system through which they interpret behavior of other
agents. There is a gap between social order and coordinated behavior, and the model
cannot fill the gap.
Social inequality is also a central research topic in sociology, and one of the major
research questions in the field is why social inequality exists and persists. The
abovementioned study by Salganik et al. (2006) clearly demonstrates how social
inequality among songs emerges. However, it does not explain why the inequality
persists. In general, social inequality cannot be sustained without people’s belief that
it is just. It is true that powerful ruling class sustains inequality by suppressing people
in other classes, but this asymmetric, unjust system leads to social unrest making
society unstable and inefficient. Thus, for social inequality to persist, it is necessary
that people believe that it is just and accept it. Although it is an excellent study of
inequality, the study by Salganik et al. (2006) does not clearly show how and why
the inequality persists. Here again, we observe a gap between social inequality at the
behavioral level and people’s belief about it.
I do not mean to criticize Macy and Sato (2002) and Salganik et al. (2006) in
particular. I picked up them as examples of the studies in computational social
science to show that most of the studies in the field focus on behaviors not on belief.
This creates the abovementioned gap. There would be no problem if we are
interested only in behavioral patterns and conduct studies within computational
14 Y. Sato

social science. However, I argue that this strategy does not fully exploit the power
and potential of computational social science when it is applied in sociology. To do
this, we need to fill the gap.

2.3 How Can We Fill the Gap?

2.3.1 Agent-Based Modeling and Interpretation

How can we fill the gap? As abovementioned, we need to assume that actors have a
belief system through which they interpret reality and add meaning to it. This is not a
new assumption, though. Rather, this stands on sociological tradition. Berger and
Luckmann (1966), a classical book of social theory, argued that social “reality” does
not exit by itself. Rather, it is socially constructed. My interpretation of their theory
on the emergent of social reality is as follows. In the beginning, an actor performs a
behavior. Then, another actor observes it. However, he/she does not observe as it
is. he/she interprets it, adds meaning to it, and decides whether to adopt it or not. If
he/she adopts it, the third actor follows the same procedure. Then if many actors
follow the same procedure, the behavior turns to be a social action and social reality.
Goldberg and Stein (2018) adequately build an agent-based model based on
Berger and Luckmann’s (1966) theory of social construction of reality. Because
their model is an excellent example filling the gap, I will explain its details to show
how we can fill the gap.
Goldberg and Stein (2018) call their theory a theory of associative diffusion and
assume a two-stage transmission, which is different from simple contagion. In the
beginning of the two-stage transmission process, actor B observes that actor A
smokes. Then, actor B interprets the meaning of smoking and evaluates it in his/her
cognition system. Finally, actor B decides whether to smoke or not. If he/she decides
to smoke, actor A’s behavior (smoking) is transmitted to actor B.
Based on their theory, Goldberg and Stein (2018, pp. 907–908) propose the
following proposition:
Associative diffusion leads to the emergence of cultural differentiation even when agents
have unobstructed opportunity to observe one another. Social contagion does not lead to
cultural differentiation unless agents are structurally segregated.

Then, to check the validity of the proposition, they built an agent-based model
and conducted simulation to show that the proposition is valid. What makes their
model different from other models of contagion is that an agent in their model is
equipped with an associative matrix. The associative matrix shows the strength of
association between practices, which are exhibited by agents. In the model two
agents—agents A and B—are randomly selected. Agent B observes agent A
exhibiting practices i and j at certain probabilities that are proportional to agent
A’s preference on them. Then, B updates his/her associative matrix. In the update
process, the association between practices i and j becomes stronger. This is because
2 Sociological Foundations of Computational Social Science 15

he/she observed that agent A exhibited practices i and j simultaneously. Then, agent
B updates his/her preference over practices. Note that the update does not automat-
ically occur. Agent B calculates constraint satisfaction using the updated preference
and the associative matrix. If it is larger than constraint satisfaction using the old
preference and the associative matrix, he/she keeps the updated preference. Other-
wise, he/she keeps the old preference. Constraint satisfaction is a concept developed
in cognitive science and shows that an actor (agent) is satisfied if he/she resolves
cognitive dissonance.
Conducting simulation with this agent-based model, Goldberg and Stein (2018)
show that cultural differentiation emerges even if agents are not clustered in different
networks.
I highly evaluate their study because they started with important sociological
concepts such as meaning and interpretation. Agent-based modeling did not come
first. Rather, they started with some anecdotes, argued that conventional contagion
models cannot explain them, showed the importance of meaning and interpretation
to explain them, and built an agent-based model in which agents interpret other
agents’ behaviors and add meaning to them.
This is what I want to emphasize in this chapter. Many of sociological studies
using agent-based models, to my knowledge, apply the models to social phenomena
focusing on the behavioral aspect of agents, not on interpretive aspects of agents.
Take Schelling’s (1971) model of residential segregation, for example. As
abovementioned, his model demonstrates how agents’ decision-making at the
micro-level generates residential segregation at the macro-level. This is one of the
strengths of agent-based modeling. However, an agent in the model does not
interpret behaviors of his/her neighbors. He/she reacts to the rate of neighbors of
the same character as his/hers. He/she does not ask himself/herself why a neighbor
moves or stays; he/she does not add meanings to the behavior of the neighbor. In
other words, agents in the model are a kind of automated machine just reacting to the
composition of neighbors.
Even such a simple model explains the emergence of residential segregation.
However, an agent’s move can be interpreted in some ways [see also Gilbert (2005)].
If one of his/her neighbor interprets his/her move as a kind of “white flight,” he/she
might follow suit. In contrast, if the neighbor interprets the move differently, he/she
might stay, which would not lead to residential segregation. Thus, different inter-
pretations of agents’ behavior result in different outcome at the macro-level.
Take power for another example. Suppose that agent A does something, say
smoking, in front of agent B. If agent B thinks that he/she is implicitly forced to
smoke, too, he/she interprets that agent A exercises power on him/her. However, if
agent B interprets that agent A smokes just for his/her fun, he/she does not feel
forced to smoke. Thus, it depends on agent B’s interpretation of agent A’s behavior
whether power relation between agents A and B is realized or not (Lukes, 1974,
2005).
These examples clearly show that agents in agent-based modeling should be
equipped with an interpretation system. As pointed out, Goldberg and Stein (2018)
16 Y. Sato

Following convention Choosing a new action


(Backward-looking rationality) (Forward-looking rationality)

Reflecting on their goal and


discovering a new goal
(Reflexivity)

Fig. 2.1 Relationship between following convention, choosing a new action, reflecting on the goal,
and discovering a new goal. (Source: Sato 2017, p. 42, Fig. 3.2)

propose an agent-based model with such a system and report interesting findings
based on the simulation of the model.
Squazzoni (2012, pp. 109–122) also argues the importance of social reflexivity,
which is closely related to the concepts of interpretation and meaning and cites two
studies done by him and his colleagues (Boero et al., 2004a, b, 2008). In the first
study (Boero et al., 2004a, b), firms in an industrial district are assumed to have one
of the four behavioral “attitudes” and to replace one to another in certain conditions.
In conventional agent-based models, agents are assumed to perform a behavior. In a
model in which agents play a prisoner’s dilemma game, for example, they cooperate
or defect. In contrast, an “attitude” is a bundle of strategies or behaviors. An agent
reflects his/her attitude and replaces it to another attitude if necessary. In the second
study (Boero et al., 2008), agents are placed in a toroidal 80 × 80 cell space and are
assumed to stay or move based on their happiness. One of the scenarios in their
model is that agents have one of the four “heuristics” and randomly change them
under a certain condition. A “heuristic” is a behavioral rule like an attitude in their
first study. Here again, agents reflect their heuristics and change them if necessary.
An important characteristic of the models is that agents choose not a behavior but
a rule of behaviors, which are called an attitude and a heuristic. An agent reflects the
rule they chose under certain conditions and replace it with another rule if necessary.
There is no interpretation process in the models. However, if the process is added to
the models, agents can be assumed to interpret the conditions where they are placed,
to reflect the current rule they chose, and to replace it to another rule if necessary.
Sato (2017) also points to the importance of meaning and reflexivity to fill the gap
between agent-based modeling and social theory. This is because social theories
often point to their importance in the study of modern and postmodern society
(Giddens, 1984; Imada, 1986; Luhmann, 1984). Thus, for agent-based modeling to
contribute the advance of social theory, it needs to incorporate meaning and reflex-
ivity in models. Sato (2017) revisits Imada’s (1986) triangular framework of action
(Fig. 2.1) and proposes a framework for agent-based modeling to incorporate
meaning and reflection. Imada’s framework consists of three types of action: Fol-
lowing convention, choosing a new action, and reflecting on the goal and
2 Sociological Foundations of Computational Social Science 17

Fig. 2.2 Sets of goals.


(Source: Sato 2017, p. 44,
Fig. 3.4)

discovering a new goal. Actors follow a convention as long as it does not cause a
problem. Borrowing the terminology of rational choice theory, I argue that
backward-looking rationality dominates in society to lighten the cognitive burden
of actors. However, if an external change occurs and the convention cannot solve
problems caused by the change, actors apply forward-looking rationality to the
change and find a new action which they believe will solve the problems. If the
new action actually solves the problem, it becomes a convention. In contrast, if it
cannot, actors reflect on the current goal and try to discover a new goal that they
believe would result in a better outcome. Then, if actors can find such a new goal and
an action that achieves it, the action becomes a new convention.
The key point of Imada’s framework when we incorporate it in agent-based
modeling is that actors find a new goal if they succeed. How can this process be
modeled? Logically, it is impossible because of the following reasoning. In Imada’s
framework actors find a goal that has not been found. Take the concept of “sustain-
able development,” for example. When advanced countries enjoyed economic
growth, their goal was only development without considering sustainability. As
people, governments, and other organizations realize social problems caused by
development, they invent the concept of “sustainable development” focusing both
on economic development and on sustainability. This example shows that the set of
goals is infinite. A goal is new because it has not been invented. Then, theoretically,
the set of goals must be infinite. However, creating an infinite set in agent-based
modeling is logically impossible. Then, how can we incorporate reflexivity in agent-
based modeling?
Sato (2017) proposes an assumption that agents have limited cognitive capacity;
they consider only a limited number of goals, which they have known (Set A in
Fig. 2.2). Then, set A is assumed to be included in a larger set, set B in Fig. 2.2.
Agents do not know goals in B-A. If a goal in B-A enters set A and a known goal
leave set A, the goal entering set A is new to agents. If agents interpret the new goal
is better than goals in set A, they will choose the new goal. If set B is large enough
18 Y. Sato

for simulation, set A always has goals new to agents. This could be a second-best
solution to the abovementioned problem.
In this subsection, I referred to studies by Goldberg and Stein (2018), Boero et al.
(2004a, b, 2008), and Sato (2017). Although their approaches are different from each
other, all of them try to incorporate meaning, interpretation, and reflexivity in agent-
based modeling. This line of research contributes to filling the gap between compu-
tational social science and sociology and advancing sociological studies with the
help of computational social science, a powerful tool in social science.

2.3.2 Digital Data Analysis and Sociology

Digital data analysis is a rapidly growing body in sociology. For example, Chang
et al. (2021), which was mentioned in the beginning of this chapter, is an excellent
example of how digital data analysis advances sociological study of inequality. If the
sociological aspect of inequality among social groups had not been included in their
study, the study would not have been attractive to sociologists conducting conven-
tional research on social inequality. My point on their paper is that the sociological
research question on inequality among social groups led to digital data analysis
suitable for answering the question.
Another intriguing example of digital data analysis in sociology is the study on
newspaper coverage of U.S. government arts funding done by DiMaggio et al.
(2013). According to the authors, after a good relationship between National
Endowment for the Arts (NEA) and artists for two decades, the government support
for artists became contentious between the mid-1980s and mid-1990s. A good
indicator of the contention is a decline in NEA appropriations from 1986 through
1997. To explain the decline, the authors investigate press coverage of government
arts support. Their research question is “how did the press respond to, participate in
or contribute to the NEA’s political woes?” (DiMaggio et al., 2013, p. 573).
To answer the question, they applied Latent Dirichlet Allocation, a topic model of
text analysis, to newspaper articles in Houston Chronicle, the New York Times, the
Seattle Times, the Wall Street Journal, and the Washington Post published in the
abovementioned period. The topic model extracted 12 topics, and the authors
grouped them into three categories: (1) social or political conflict, (2) local projects
and revenues, and (3) specific arts genres, types of grant, or event information. After
detailed analysis of the topics and newspaper articles, the authors got the following
findings (DiMaggio et al., 2013, p. 602). (1) Press coverage of arts funding suddenly
changed from celebratory to contention in 1989, which continued in the 1990s.
(2) Negative coverage of the NEA emerged when George H.W. Bush was elected
president. (3) Press coverage reflected three frames for the controversy. (4) Press
coverage of government arts patronage differed from newspaper to newspaper.
As the authors emphasize, topic modeling is suitable for the study of culture
because it can clearly capture the relationality of meaning. Moreover, one of the
strong points of the paper is that it has a clear sociological research question with
2 Sociological Foundations of Computational Social Science 19

which they conducted topic modeling analysis. This became possible, maybe,
because Paul DiMaggio, one of the authors, has deep expertise in sociology of
culture.
These two exemplars of digital data analysis suggest that excellent sociological
study using digital data analysis should start with good research questions not with
available data. Digital data analysis is a social telescope (Golder & Macy, 2014) with
much higher resolution than conventional social surveys and experiments. However,
it is sociological expertise, not data itself, that finds order and pattern in data obtained
through the social telescope. Without such expertise and research questions based on
it, digital data analysis would not contribute to the advance of sociological inquiries.
In addition to sociological expertise, including meaning in digital data analysis
would make the analysis substantively contribute to sociological studies. As pointed
out in the previous subsection, meaning and interpretation are important concepts
when we conduct sociological inquiries. The abovementioned study of newspaper
articles by DiMaggio et al. (2013) is an excellent example of this approach. Take
Twitter data, for example. Suppose that actor A tweets a message supporting
politician X. It is not always the case that actor A expresses his/her true opinion in
the message. He/she may do so, but he/she may not because he/she hides his/her true
opinion expecting negative reactions of his/her Twitter followers. Then, actor B, a
follower of actor A, does not receive actor A’s message as it is. He/she interprets it
and tries to understand its meaning. Then, he/she tweets his/her message about actor
A’s original message based on his/her interpretation of it. He/she also may express
his/her opinion or may not. Other actors including actor A, in turn, read the message,
interpret it, and tweet a message, and so on.
To the best of my knowledge, most of the digital data analysis of Twitter data or
other text data lacks this interpretation/expression process. However, Twitter data
analysis with this process could unveil the relationship between actions (expressing
messages) and the inner world of actors, which would attract sociologists studying in
main domains in sociology. This is because interpretation and meaning have been
central concepts in sociology. Analysis of only behavioral digital data would not
enter the central segment in sociology. In contrast, analysis with the interpretation/
expression process would promote collaboration between sociologists and analysts
of digital data and contribute to the advance of sociology.

2.4 Conclusions: Toward Productive Collaboration


Between Sociology and Computational Social Science

Radford and Joseph (2020) emphasize critical roles of social theory when we use
machine learning models to analyze digital data. Their coverage of topics is wide,
but, to summarize their argument, machine learning models are useless unless
researchers start with social theory in their research with machine learning models.
Study using machine learning models without social theory is like a voyage without
20 Y. Sato

a chart. It could not create hypotheses important in social science. We would not
know whether findings of the study are new. The crux of Radford and Joseph’s
(2020) argument is that not available data for machine learning models, but social
theory comes first.
The main message of this chapter is in line with theirs. For studies using
computational social science methods such as agent-based modeling and digital
data analysis to be fruitful and contribute to the advance of sociology, we should
start our research with sociological theories and concepts and create hypotheses
based on them. Then, computational social science methods help us rigorously test
their validity. And, most importantly, the methods may lead to new findings that
could not have been found using conventional sociological methodology. This is
highly plausible because the methods are new social telescopes with much higher
resolution than that of conventional methodology. Then, we must bring the findings
back to the original sociological theories and concepts, find their problems, and
invent new theories and concepts by fixing them. This is a way to advance sociology
with the help of computational social science and to have computational social
science substantively contribute to the advance of sociology. Furthermore, in this
way, computational social scientists could improve their methods so that improved
methods could be more suitable for sociological analysis. This means that sociology
contributes to the advance of computational social science. This collaboration would
advance both of sociology and computational social science and open a door to new
exciting interdisciplinary fields.

References

Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology
of knowledge. Doubleday.
Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the
industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In
G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems
(pp. 48–69). Springer.
Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro
technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial
Societies and Social Simulation, 7(2), 1.
Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties:
An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174.
Cederman, L.-E. (2005). Computational models of social forms: Advancing generative process
theory. American Journal of Sociology, 110(4), 864–893.
Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021).
Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589,
82–87.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41, 570–606.
Giddens, A. (1984). The constitution of society: Outline of the theory of structuration. Polity Press.
2 Sociological Foundations of Computational Social Science 21

Gilbert, N. (2005). When does social simulation need cognitive models? In R. Sun (Ed.), Cognition
and multi-agent interaction: From cognitive modeling to social simulation (pp. 428–432).
Cambridge University Press.
Gilbert, N. (2019). Agent-based models (2nd ed.). Sage Publications.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Imada, T. (1986). Self-organization: Revival of social theory. Keiso Shobo. (In Japanese).
Luhmann, N. (1984). Soziale Systeme: Grundriß einer allgemeinen Theorie. Suhrkamp.
Lukes, S. (1974). Power: A radical view. Macmillan Education.
Lukes, S. (2005). Power: A radical view (2nd ed.). Palgrave Macmillan.
Macy, M. W., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Science, 99(Suppl. 3), 7214–7220.
Macy, M. W., & Willer, R. (2002). From factors to actors: Computational sociology and agent-
based modeling. Annual Review of Sociology, 28, 143–166.
Parsons, T. (1951). The social system. Free Press.
Radford, J., & Joseph, K. (2020). Theory in, theory out: The uses of social theory in machine
learning for social science. Frontiers in Big Data, 3, 18. https://doi.org/10.3389/fdata.2020.
00018
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and
unpredictability in an artificial cultural market. Science, 311, 854–856.
Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social
theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.),
Reconstruction of the public sphere in the socially mediated age (pp. 37–46). Springer Nature
Singapore Pte.
Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1,
143–186.
Seiyama, K. (1995). Perspectives of theory of institution. Sobunsha. (In Japanese).
Squazzoni, F. (2012). Agent-based computational sociology. Wiley.
Weber, M. (1921). Soziologische Grundbegriffe. In Grundriß der Sozialökonomik, III. Abteilung,
Wirtschaft und Gesellschaft. J.C.B. Mohr.
Chapter 3
Methodological Contributions
of Computational Social Science
to Sociology

Hiroki Takikawa and Sho Fujihara

3.1 Introduction

Currently, the data environment for sociology is changing dramatically (Salganik,


2018). Traditionally, the main type of data used by quantitative sociology was
survey data. Collecting survey data entailed significant financial and human costs;
therefore, the data provided by surveys were scarce. Such data are clean, structured,
and collected by probability sampling. In the digital age, however, people’s behavior
is observed daily and recorded constantly, which creates vast amounts of behavioral
data known as digital traces (Golder & Macy, 2014). In addition, surveys and
experiments using crowdworkers, based on nonprobabilistic samples, are by far
the least expensive and can be collected in large quantities, and they are suitable
for various interventions (Salganik, 2018). In this digital age of computational social
science, data are messy and unstructured yet abundant.
As the data environment changes, the methodologies required in sociology also
change. In the era of scarce data, the main focus of methodology was how to
efficiently extract meaningful information from scarce data (Grimmer et al., 2021,
2022). However, in the new era of abundant data, different methodologies are
needed. Moreover, strategies for how sociological theory should be developed
using these methodologies must also be considered.

H. Takikawa (✉)
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
S. Fujihara
Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 23
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_3
24 H. Takikawa and S. Fujihara

There are several possibilities for processing and analyzing such new data, but at
the center of these methods are machine learning methods (sometimes more broadly
referred to as “data science”) (Hastie et al., 2009; James et al., 2013; Jordan &
Mitchell, 2015). Machine learning differs substantially from traditional quantitative
methods used in the era of scarce data in its assumptions and “culture” (Grimmer
et al., 2021). Sometimes it is argued that machine learning techniques are not useful
for theory-oriented social sciences because they are not explanation-oriented and
often produce uninterpretable black boxes. One pundit even argued that theory was
unnecessary in the era of big data (cf. Anderson, 2008). In contrast, this chapter
argues that these techniques are potentially very useful, not only for conventional
sociological quantitative analysis but also for sociological theory building. To date,
various machine learning approaches have been used in computational social sci-
ence, but they have not necessarily been systematically utilized to contribute to
sociological (social science) theory. Conversely, many sociologists underestimate
the potential contribution of machine learning and computational social science to
sociological theory. To change this situation, we aim to identify how machine
learning methods can be used to contribute to theory building in sociology.
In the next section, after noting the differences between the culture of machine
learning and that of traditional statistics, we define the concept of machine learning
as used in this chapter. We then explain the basic ideas and logic of machine
learning, such as sample splitting and regularization. In Sect. 3.3, we summarize
three potential applications of machine learning in sociology: breaking away from
deductive models, establishing a normative cognitive paradigm, and using machine
learning as a model of human decision making and cognition. Section 3.4 summa-
rizes the statistical methods conventionally used in sociology, including description,
prediction, and causal inference, and describes their challenges, focusing on the use
of regression analysis. Sections 3.5 and 3.6 introduce current applications of
machine learning in sociology and discuss future directions.

3.2 Machine Learning


3.2.1 The Culture and Logic of Machine Learning

Machine learning differs from the methodologies traditionally used in the social
sciences in terms of its assumptions, its conception of the desirability and value of
research objectives and methods, and, more broadly, its cognitive culture (Grimmer
et al., 2021) but that does not mean that its similarities with traditional statistics
should be underestimated (cf. Lundberg et al., 2021). Breiman (2001) described the
culture of traditional statistics as a data modeling culture and that of machine
learning as an algorithmic culture. The distinction between the two can be explained
as follows (Breiman, 2001). Consider that data are produced as a response y given an
input x. How x is transformed into y is initially a black box. In the data modeling
culture, we consider probabilistic models that model the contents of the black box,
3 Methodological Contributions of Computational Social Science to Sociology 25

such as linear or logistic regression models. In contrast, in algorithmic cultures, the


goal is to discover an algorithm, f(x), on x that predicts the response y well, leaving
the contents of the black box unknown. Thus, one can describe traditional statistics
as generative modeling, which models the generative process of data, and machine
learning as predictive modeling, which is oriented toward prediction rather than the
generative process (Donoho, 2017). Mullainathan and Spiess (2017) stated that
while social sciences are traditionally interested in the value of the estimated
coefficients β of explanatory variables in their models, machine learning is interested
in the predicted value y. In other words, social science is interested in modeling the
contents of the black box, while machine learning is interested in predicting the
results produced by the black box.
From the above quick introduction, many social scientists may feel that a data
modeling culture that models the contents of a black box is more beneficial to
sociological theory building. However, this is not true. First, as Breiman emphasizes,
the choice between data modeling and algorithmic modeling is not essential; what
matters is what model should be used to solve the actual problem (Breiman, 2001). If
the problem to be solved in the social sciences is better approachable by machine
learning, its use should not be discouraged. Second, even if the problem to be solved
in social science is to understand the causal mechanisms of social phenomena (i.e.,
the contents of the black box), data modeling is not always the most appropriate
means for this purpose. Breiman argues that, in the end, if an algorithmic model is
more effective in solving a problem, then it is also more useful in understanding the
mechanisms that create the problem (Breiman, 2001).

3.2.2 Definition of Machine Learning

Machine learning is an extremely broad term that covers a wide range of ideas and is
extremely difficult to define because it is used with slightly different meanings in
different fields. Molina and Garip (2019, p. 28) describe machine learning succinctly
as “machine learning (ML) seeks to automate discovery from data”; Grimmer et al.
(2021, p. 396) state that “machine learning is a class of flexible algorithmic and
statistical techniques for prediction and dimension reduction.” Athey (2018, p. 509)
states that machine learning is “a field that develops algorithms designed to be
applied to data sets, with the main areas of focus being prediction (regression),
classification, and clustering or grouping tasks.” Jordan and Mitchell (2015, p. 255)
describe machine learning as a field that addresses the following two questions.
1. “How can one construct computer systems that automatically improve through
experience?”
2. “What are the fundamental statistical-computational-information-theoretic laws
that govern all learning systems, including computers, humans, and
organizations?”
26 H. Takikawa and S. Fujihara

The latter theoretical question is not explicitly asked in the context of applications in
the social sciences, but it is sometimes necessary to ask such questions to think
deeply about why machine learning applications succeed or fail in any given
situation. Identifying the mechanisms by which humans and organizations learn is
also a challenge for sociological theory itself.
With the exception of the last theoretical question, the points can be summarized
as follows from these definitions:
1. Learning from data and experience, i.e., data-driven,
2. Aiming to improve the performance of task resolution, such as prediction and
classification, and
3. Aiming to develop algorithms that automate task resolution.
Overall, machine learning can be defined as a procedure for learning from data and
creating the best algorithm for automating the solution of tasks such as prediction
and classification.
Machine learning methods are often classified as supervised or unsupervised
(Bishop & Nasrabadi, 2006). In this section, we first introduce supervised machine
learning and then address unsupervised machine learning.

3.2.3 Prediction and Supervised Machine Learning

The goal of supervised machine learning is to predict a response variable y using


predictors or “explanatory variables” x from data (Jordan & Mitchell, 2015). We call
this approach supervised because in the training phase of the model, predictor and
response variable pairs (x, y) are given in advance as training data. Training is
performed based on these training data, and the resulting model is used to predict
the value of the unknown response variable. The data x used for prediction can be
extremely high-dimensional. Images and text data, for example, are examples of
very high-dimensional data. Prediction is performed by learning a mapping f from
x to y. This mapping can be a simple function, such as a linear regression model, or it
can be a very complex mapping, such as an artificial neural network.
In machine learning, we are more interested in y, the outcome of the prediction,
than in the coefficient β of the explanatory variable (Mullainathan & Spiess, 2017).
Additionally, in machine learning, the goodness of fit of a model to the data is not
important. Rather, we are ultimately concerned with the performance of prediction
and the generalization for unknown events. The most important point in machine
learning is that the best-fitting model for the available data is not necessarily the best
model for predicting unknown data.
3 Methodological Contributions of Computational Social Science to Sociology 27

3.2.3.1 Sample Splitting

How, then, can we discover models with good generalization performance? For this
purpose, a procedure called sample splitting, which is the most important procedure
in supervised machine learning, is used (Hastie et al., 2009). In sample splitting, data
are divided into training data and test data. In the training data, pairs of (x, y) are
given, on which the model is trained. This is data for which the answer to which
x corresponds to which y is provided in advance, and in this sense, it is called
supervised learning. Test data, however, are used to evaluate the trained model.
Testing the predictive performance using test data is called out-of-sample testing. In
this way, models with good generalization performance can be selected. We consider
this process in more detail below.
In the training data, the goal is to minimize the loss defined by a loss function. For
continuous variables, the loss function can be, for example, the mean of the squared
value of the correct answer yi minus the predicted value yi or the mean squared error
(MSE). The formula for MSE is as follows:

n
1
MSE = ð yi - yi Þ 2 ,
n i=1

where n is the sample size in the data set. The smaller the loss is, the better the
model’s fit to the training data. The error defined on the training data is called the
training error. However, improving the fit to the training data is not in itself the goal
of machine learning models. The goal is to improve the prediction performance
given new data. For this purpose, test data other than the training data are prepared to
evaluate the prediction performance. The evaluation of test data is also based on the
MSE (for continuous variables), but it is called the test MSE to distinguish it from
training data. The error in the test data is called the test error or generalization error.
Thus, in supervised machine learning, the goal is to minimize the generalization
error, not the training error. Why not simply use all the data at hand as training data
and adopt the model with the best fit? A good fit to the training data does not
necessarily equate to good prediction performance for new, unseen data. This
phenomenon is called overfitting, where the model overreacts to random fluctuations
or noise in the training data set. When overfitting occurs, the variability of the model
increases due to differences in the training data, resulting in poor predictive perfor-
mance. Therefore, avoiding overfitting is of great practical importance in machine
learning (Domingos, 2012).

3.2.3.2 Bias–Variance Tradeoff

To better understand the avoidance of overfitting, let us introduce the tradeoff


between bias and variance (Hastie et al., 2009; James et al., 2013). The expected
value of the test MSE (E(y - f x)2), which is the goal of optimization in machine
28 H. Takikawa and S. Fujihara

learning, can be decomposed into the variance of the prediction Var( f x ), the square
of the bias of the prediction [Bias( f x )]2, and the variance of the error Var(ε), where
f x represents the model prediction (James et al., 2013). The equation is as follows:

E ðy - f x Þ2 = Varðf x Þ þ ½Biasðf x Þ]2 þ VarðεÞ:

The variance of error Var(ε) is a characteristic of the true “world” and cannot be
reduced by any model. Therefore, the goal of optimization is to simultaneously
reduce Var( f x ) and Bias( f x ) of the predictions.
Bias is the difference between the true model and the learned model. For example,
if a simple linear regression model is used to approximate a complex nonlinear
relationship, the bias will be large. In contrast, variance is the amount by which
f x varies when using different data sets. A high variance would result in large
changes in the parameters of the model depending on the data set.
A tradeoff exists between bias and variance (Bishop & Nasrabadi, 2006; Yarkoni
& Westfall, 2017). In general, bias can be reduced by using complex models. For
example, suppose we use a polynomial regression model to improve the goodness of
fit for a curve. Compared to that of a first-order polynomial regression (linear
regression) model, the goodness of fit should increase monotonically as the number
of variables is increased to the second and third orders. However, overly complex
models have large variances because they are overfitted to a particular data set. In
other words, a small change in the data set causes a large fluctuation in the estimates.
Conversely, a simple model is less affected by differences in data sets, but it cannot
adequately fit the data at hand, resulting in a large bias. Therefore, a model with a
moderate complexity is preferred to avoid overfitting and reduce the variance while
still providing a certain degree of goodness of fit and lowering the bias.
Traditional social science is not concerned with the predictive performance of
models and has preferred models with low bias [although multilevel analysis, which
is heavily used in sociology, can actually be interpreted as an attempt to improve
model performance by introducing bias (Gelman & Hill, 2006)] (Yarkoni &
Westfall, 2017). In traditional methods, approaches such as splitting the data into
samples are almost never used, and practically only training data are used, aiming
solely to increase the goodness of fit of the model or to test the significance of a
particular regression coefficient. Improving only the fit to the training data in this
way may lower bias but cause overfitting, resulting in poor predictive performance
for unseen data (or the test data). In such procedures, there is no explicit step of
evaluating the performance of multiple models and making a choice. Some may note
that even in sociology, there are cases where performance is compared among
several different models and a selection is made. However, analytical methods
traditionally used for model selection, such as likelihood ratio tests, AIC, and BIC,
have various limitations and restrictions and may not sufficiently prevent overfitting.
For example, in regard to AIC, as the data size increases, the penalty is not sufficient,
and complex models are almost always preferred (Grimmer et al., 2021). In contrast,
the main difference is that the model selection procedures with sample splitting and
3 Methodological Contributions of Computational Social Science to Sociology 29

out-of-sample testing used in machine learning are empirical, i.e., data-based and
general-purpose selection procedures that can be applied to any model. This method
enables the evaluation of the predictive performance of any supervised machine
learning model.

3.2.3.3 Three-Way Sample Splitting and K-Fold Cross-Validation

Two points should be noted here. First, the training and testing phases must be
strictly separated. For example, if an out-of-sample test is conducted and the
prediction performance is low, the model is modified, and the out-of-sample test is
conducted again with the same test data, the model will be overfitted for the test data.
Rather than this, a portion of the training data should be spared for validation to
verify the prediction performance. The model’s hyperparameters are adjusted using
these data; then, the model trained using the training and validation data is tested on
the test data. In this case, the data are divided into training, validation, and test data
(Hastie et al., 2009).
Second, while the division into training and test data is a means of avoiding
overfitting, it also carries the risk of underfitting since saving data for testing reduces
the size of the data for training. An alternative way to maximize the use of the data at
hand is k-fold cross-validation (Hastie et al., 2009; Yarkoni & Westfall, 2017). It is
common to use five or tenfold, although this number can vary depending on the
complexity of the model. In this method, the data are randomly split into training and
test data k times. Then, k out-of-sample tests are performed, and the predictive
performance of the model is evaluated based on the average of the k out-of-sample
tests. In this method, all data are used as training data, which is more efficient than a
one-time split. However, it also has the problem that the computation time is k times
longer.

3.2.3.4 Regularization

While sample splitting is a means of detecting overfitting, it does not tell us what
kind of model to build to prevent overfitting itself. Regularization is a tool for
building models to avoid overfitting (Hastie et al., 2009; Yarkoni & Westfall, 2017).
Overfitting is more likely to occur when a model is complex. Regularization is
therefore a means to improve the predictive performance of a model by introducing
constraints on model complexity. The most commonly used approach is the idea of
penalizing for complexity. Specifically, a penalty term for model complexity is
introduced into the function that the model should optimize, in addition to loss
minimization. For example, in the case of the Lasso regression model (Tibshirani,
1996, 2011), the training is set up to minimize the squared error plus a penalty term
proportional to the sum of the absolute values of all regression coefficients. Gener-
ally, the more complex a model is, i.e., the more variables are used, the greater the
extent to which the squared error can be reduced; however, the more complex a
30 H. Takikawa and S. Fujihara

model is, the larger the sum of the absolute values of the regression coefficients
becomes. In other words, there is a tradeoff between minimizing the squared error
and minimizing the sum of the absolute values of the regression coefficients. The
objective of the Lasso regression model is to achieve an optimal balance between the
two. From the perspective of the bias–variance tradeoff, the model can be positioned
as an attempt to improve predictive performance by introducing a bias called a
penalty term. The sum of the absolute values of the penalty terms used in Lasso
regression is called the L1 norm. In contrast, the ridge regression model uses the
square root of the sum of the squares of the regression coefficients, or the Euclidean
norm, as the penalty term. This is also called the L2 norm. In general, the lasso
regression model has an incentive to set some regression coefficients to zero and has
a strong regularizing effect. Penalized regression is effective when the number of
predictors (explanatory variables) is large relative to the sample size. However, the
coefficient of the penalty term (hyperparameter), which determines how the penalty
works, must be adjusted appropriately, and for this purpose, the sample should be
divided into three parts.
The concept of regularization itself is more general and is not limited to the
addition of penalty terms in regression models. There are various other regulariza-
tion devices in supervised machine learning, such as early stops, which stop the
learning process early in artificial neural networks and deep learning, and dropouts,
which remove certain nodes in the learning process (Goodfellow et al., 2016).

3.2.4 Measurement, Discovery, and Unsupervised Machine


Learning

Unsupervised machine learning aims to discover the latent structure of data x without
the correct answer label y. In other words, it involves reducing high-dimensional
data x to low-dimensional, interpretable quantities or categories. Unsupervised
machine learning includes cluster analysis such as k-means and hierarchical cluster-
ing, principal component analysis, and latent class analysis. In this sense, it is a
technique that has been used relatively often in the traditional social sciences. If we
reposition it in the context of machine learning, the following two points are
important.
First, the importance of ensuring the validity of classifications and patterns
discovered by unsupervised machine learning is emphasized. Although it is difficult
to assess the validity of unsupervised machine learning models, various methods
have been proposed, including human verification (Grimmer et al., 2022). Second,
applications to extremely high-dimensional data, such as text data, have been
advanced. In the past, survey data were not high-dimensional, but with ultrahigh-
dimensional data, such as text data, which contain tens or hundreds of thousands of
variables (e.g., vocabulary appearing in a corpus), the discovery of patterns by
3 Methodological Contributions of Computational Social Science to Sociology 31

unsupervised machine learning is particularly beneficial. Semisupervised learning, a


method that combines supervised and unsupervised learning, is also used.
Topic models and word embedding models as unsupervised machine learning in
text analysis are discussed in detail in Chap. 5.

3.3 Potential of Machine Learning

Machine learning has great potential to transform the traditional way of conducting
social science research on various social science issues (Grimmer et al., 2021). Here,
we summarize the potential of machine learning in three key areas.

3.3.1 Breaking Away from the Deductive Model

Traditional social science has taken the deductive model approach as its scientific
methodology (Grimmer et al., 2021; cf. McFarland et al., 2016), which, according to
Grimmer et al., is a suitable method for efficiently testing theories based on scarce
data. Deductive models require a carefully crafted theory prior to observing data.
Assuming the existence of such a theory, a hypothesis that is derived from the theory
and can be tested by the data is then established. Data are then obtained, and the
hypothesis is tested only once using the data. Deductive models are highly compat-
ible with data modeling because they assume a theoretical model of how the data
were generated in advance. However, such a deductive model has various
drawbacks.
First, deductive models often use data modeling approaches that are compatible
with hypothesis testing frameworks, especially linear regression models. This leads
to the exclusion of various other, more realistic, possible modeling possibilities. In
sociology, this inflexibility of linear regression models has traditionally been the
cause of the divide between sociological theory and empirical analysis
(Abbott, 1988).
Second, the deductive model assumes that the theory is available before the data
are observed and thus cannot address issues such as how to create and elaborate
concepts from data and how to discover new hypotheses from data (Grimmer et al.,
2021). Compared to qualitative research in sociology (Glaser & Strauss, 1967;
Tavory & Timmermans, 2014), traditional quantitative research does not explicitly
incorporate a theory building phase.
Third, although the deductive model also serves as a normative scientific meth-
odology, it is difficult to rigorously follow such procedures in actual social science
practice. As a result, social science hypotheses face major problems in terms of
replicability and generalizability. In practice, in the social sciences, it is inevitable to
reformulate theories and explore hypotheses by analyzing data. However, presenting
32 H. Takikawa and S. Fujihara

hypotheses thus obtained in the framework of a deductive model is an inappropriate


practice, such as p-hacking (Simmons et al., 2011) or HARKing (Kerr, 1998).
In contrast, machine learning is constructed with a different conception than
deductive models, which provides more flexible modeling possibilities (Molina &
Garip, 2019). First, because machine learning does not follow a hypothesis testing
framework, not only simple linear models but also complex, nonlinear models with
higher-order interactions are acceptable, as long as they have good predictive
performance. In addition, machine learning algorithmic models can be interpreted
as agnostic models (Grimmer et al., 2021). The agnostic approach is the idea that
instead of assuming that there is a correct model that reflects the real-world process,
one should choose the best model for the problem. This approach allows for flexible
model selection depending on the problem.
Furthermore, machine learning does not take a hypothetico-deductive approach,
which is conducive to heuristic and exploratory research (Grimmer et al., 2021;
Molina & Garip, 2019). Unsupervised machine learning can be very useful for
discovering latent patterns in data. Supervised machine learning frameworks can
also be used effectively for concept formation and theory building.
This approach can also be used to uncover a variety of heterogeneities through
complex modeling, specifically by allowing higher-order interactions (Grimmer
et al., 2017; Molina & Garip, 2019). In particular, the identification of heterogeneity
in causal effects is extremely useful for sociology, which seeks to elucidate causal
mechanisms (Athey & Imbens, 2016; Brand et al., 2021).
Finally, machine learning does not take the idea of deductive models, which
might not be appropriate for a certain type of social science practice and thus can
break free from inappropriate practices such as p-hacking. This is closely related to
the second point discussed below.

3.3.2 The Machine Learning Paradigm as a Normative


Epistemic Model

Watts (2014) suggests that machine learning should be the normative model for
social science methodology in place of the traditional hypothetico-deductive model.
Specifically, he argues that successful prediction should be the standard by which
social science is evaluated.
Many, if not most, sociologists agree that elucidating the causal mechanisms of
social phenomena is the ultimate goal of sociology. However, the normative episte-
mic model for elucidating causal mechanisms has traditionally been assumed to be
based on a hypothetico-deductive approach. In this model, the ideal is an experi-
mental method conducted on the basis of testable hypotheses about causation.
However, although the methods of computational social science have greatly
expanded the range of application of large-scale digital experiments, the types of
social phenomena that can be tested remain limited. Therefore, sociologists have
3 Methodological Contributions of Computational Social Science to Sociology 33

traditionally attempted to test causal hypotheses by applying basic regression models


to nonexperimental observational data. However, this practice is now being strongly
criticized in terms of statistical causal inference (Morgan & Winship, 2014).
Watts argues that, in light of this situation, the criterion for evaluation should be
the success or failure of predictions by out-of-sample testing rather than the testing of
a priori hypotheses (Watts, 2014). As explained previously, the requirement for out-
of-sample testing is simple: the model’s predictions should be tested on data that are
different from those used to form hypotheses and train the model and the model with
the best predictive ability should be adopted.
The strength of out-of-sample testing is that it allows for more flexible theory
building than does hypothesis testing by making avoidance of overfitting a central
policy. Out-of-sample tests do not require that a hypothesis be set prior to the study.
Researchers can construct hypotheses, theories, and models a posteriori. In other
words, they can complicate the model to make it fit the data at hand better and to
reduce bias as long as the out-of-sample test can be passed. In contrast, hypothesis
testing practice can be interpreted as an attempt to lower variance by adhering to a
priori hypotheses and reducing flexibility (Yarkoni & Westfall, 2017). Out-of-
sample testing can also be positioned as an attempt to address the bias–variance
tradeoff by striking a balance between such tentative tests and posterior theory
building to allow for appropriate model selection.

3.3.3 Machine Learning Model as a Human Decision


and Cognitive Model

The third point is related to a view different from the previous two. Machine learning
and artificial intelligence provide algorithms to discover and recognize patterns from
data. Therefore, in a sense, machine learning can be said to simulate human
judgment and cognition.
The most obvious application of this aspect is automatic coding (Nelson et al.,
2021). Sociology requires conceptualizing, classifying, and coding a variety of data
and underlying events. When the number of data and events is small, it is possible for
humans to classify and code them all manually. However, when the amount of data
is large, this is impossible. Therefore, it is very beneficial for machines to recognize,
classify, and code patterns in the data instead of humans.
However, the simulation of judgment and cognition by machines and the classi-
fication and prediction based on that judgment and cognition have implications for
sociology that go beyond mere practical purposes. Sociology is the study of human
action, and it is essential to model how people judge, perceive, and categorize
situations to understand the mechanisms of action (Schutz & Luckmann, 1973). In
contrast, traditional analytical tools do not directly model people’s judgments and
cognition but rather model the consequences of actions and the aggregate distribu-
tion of outcomes. Therefore, the connection to sociological theory is only indirect.
34 H. Takikawa and S. Fujihara

Conversely, machine learning is more than just an analytical tool; it can relate
directly to sociological theory by modeling people’s judgments and cognitions
(cf. Foster, 2018). For example, the models of natural language processing discussed
in detail in Chap. 5 can be used not merely for the practical purpose of automatically
summarizing and labeling textual content but also to formalize the way humans
process textual meaning. This will be discussed thematically in Chap. 5.
Furthermore, comparing machine learning-based classification and pattern pre-
cipitation with human classification and pattern recognition may reveal features and
biases of human judgment (Kleinberg et al., 2015). This identification of features and
biases in comparison to machine judgments may lead to further theorizing of human
judgments, cognition, and decision making. Alternatively, machine learning can be
used to address the possibility that people are reading and acting on certain signals
from certain situations. For example, machine learning fictitious prediction methods
(Grimmer et al., 2021) are useful in considering the question of whether people can
read the social class or socioeconomic status of tweeters from their daily tweets
(Mizuno & Takikawa, 2022). That is, information related to the socioeconomic
status of tweeters is collected in advance through surveys and other means, and
the question of whether this information can be predicted from tweets alone is
examined. If machines can successfully predict socioeconomic attributes, it is highly
likely that people are reading such signals in their daily interactions, and if advanced
machine learning models cannot predict such signals at all, they may not exist for
humans either, or if they do, they may be very weak. Thus, machine learning models
can be used to examine how people understand and behave in terms of the actions
(tweets) of others. This is an analytical method that has intrinsic relevance to theory.

3.3.4 Challenges for Machine Learning

We have discussed the potential of machine learning compared to traditional quan-


titative methods. One of the biggest challenges in applying machine learning to the
social sciences is to ensure interpretability (Hofman et al., 2021; Molina & Garip,
2019; Yarkoni & Westfall, 2017). In traditional quantitative methods, simple linear
models are typically used, and the magnitude of the coefficients of interest is
examined when elucidating the “black boxes” that produce social phenomena. As
we have already noted, this approach has sometimes been inappropriately practiced
in the traditional social sciences and has led to various problems. In contrast,
machine learning selects and evaluates models in terms of their overall predictive
performance, not their individual coefficients. This allows for the construction of
models with a more realistic degree of complexity. Model realism should also be an
important prerequisite for elucidating the causal mechanisms of social phenomena.
However, it still seems essential to clarify how the individual parts of the mechanism
work to open the black box of social phenomena. In other words, being able to
interpret what factors and characteristics contribute to the predictive performance of
a model is also necessary to unravel the mechanism (Breiman, 2001). This issue of
3 Methodological Contributions of Computational Social Science to Sociology 35

interpretability is crucial to the task of applying machine learning to social science


theory building.
Closely related to interpretability is the problem of causal inference (Morgan &
Winship, 2014). Causal inference focuses on the effect of a particular treatment. This
is seemingly the opposite of machine learning, which focuses on the overall perfor-
mance of a model. However, the application of machine learning to the effective
implementation of causal inference is currently the most active area of research. For
example, machine learning is effective in estimating propensity scores, which are
often used to implement causal inference with observational data (Westreich et al.,
2010). In addition, machine learning paradigms fit better than deductive models in
tasks such as the exploratory discovery of heterogeneity in causal effects (Brand
et al., 2021).

3.4 Conventional Methods of Statistical Analysis


in Sociology
3.4.1 Aims of Quantitative Sociological Studies

Sociology has utilized various statistical analysis methods to test sociological theo-
ries and hypotheses and to discover patterns and regularity in social phenomena. In
this section, we introduce the conventional statistical analysis of sociological studies
and the related problems. It is, however, difficult to cover all statistical methods in
sociology. To contrast the conventional approach with the methods of machine
learning, especially supervised learning, we narrow our focus to statistical methods
that analyze data from social surveys using regression models. These methods are
commonly used in sociology and other social sciences, such as economics and
political science.
Quantitative sociology often uses a variety of research methods, such as small
local surveys, large national representative surveys, and panel surveys, to gather data
about individuals and groups in societies. Although some sociological studies use
experimental methods, the primary approach is to observe and gather data about
individuals, groups, and societies through social surveys. Using these data, sociol-
ogists can quantify social phenomena, compare groups, estimate the size of the
association between variables or the “effect” under a particular model, and make
causal inferences. Through this process, sociologists aim to understand and interpret
social phenomena and explain the underlying mechanisms that produce patterns and
regularities in society. Data analysis can be divided into three main tasks: descrip-
tion, prediction, and causal inference (or causal prediction) (Berk, 2004; Hernán
et al., 2019). To investigate patterns and regularities and test theories and hypothe-
ses, sociologists often use generalized linear models, particularly linear regression
models, to describe the association among variables, predict a variable of interest,
and estimate the causal effect of a treatment on an outcome (Berk, 2004).
36 H. Takikawa and S. Fujihara

3.4.2 Description

Description involves measuring the central tendency of a social phenomenon and


understanding associations between variables that indicate a social pattern and
regularity. This can include simple calculations of mean and percentage over the
entire sample or subsamples of distinct groups. However, regression analysis is
sometimes conducted for the sake of description, especially for exploring associa-
tions between variables (Gelman et al., 2021) and sometimes their changes over time
and across societies. Linear regression analysis for description constructs the best-
fitting model under the assumption of linearity and interprets the estimated coeffi-
cients (Morgan & Winship, 2014). In linear regression analysis, a dependent variable
( y) of interest and several independent variables (x) believed to be related to it are
included in the model (Elwert & Winship, 2010). The model is then improved
through the addition or removal of variables or interaction terms, and the final
model is chosen based on criteria such as R-squared and AIC. Sometimes indepen-
dent variables are added or removed step by step, and comparisons of model fit
among multiple models are conducted to show the importance of the variables and
interpret why changes in coefficients have occurred. After the final (or preferred)
model is chosen, the estimated coefficients (β), standard errors, confidence intervals,
and p-values are presented. Because the coefficients are obtained under the strong
but simple assumption of linearity, they are easy to interpret. The estimated coeffi-
cients of a linear regression model indicate the average change in y for a one-unit
change in the independent variables, while the values of the other variables are held
the same. This approach is interested in the association between the independent
variables x and the dependent variable y or the “effect” estimated from the preferred
model. Therefore, a regression model is a simple and effective tool for capturing
social phenomena and regularity, especially in terms of linear associations. The
family of regression models has been developed for analyses that incorporate the
influence of higher-level characteristics, such as region and school, than individuals
(multilevel modeling) and for analysis of categorical and limited dependent vari-
ables. However, such an interpretation makes sense only if the model correctly
captures the relationship. It is also important to note that because the coefficients
correspond to comparisons between individuals, not changes within individuals
(Gelman et al., 2021), they do not represent causal effects but associations.

3.4.3 Prediction

The second type of sociological data analysis involves constructing a model for
purely predicting y from x. Although similar to the first method of description above,
the predictive task is more concerned with the dependent variable y or algorithms for
predictions than with the independent variables x (Salganik et al., 2020). The focus is
not on the estimation and interpretation of the coefficients of the independent
3 Methodological Contributions of Computational Social Science to Sociology 37

variables x but instead on the predictive accuracy of y under the model with these
independent variables (Molina & Garip, 2019). For example, a model can be
constructed to predict which individuals are more likely to drop out of high school
or experience poverty. Machine learning, particularly supervised learning, has
played a significant role in this type of prediction. However, the application of
predictions to sociological studies has not been fully explored, which is one reason
machine learning methods have not been fully exploited in sociological research
compared to other social science studies and why machine learning methods have
not made substantial contributions to sociology, especially studies using data from
social surveys.
However, predictive tasks are considered essential and have significant implica-
tions for sociological research. We introduce sociological studies using prediction
and discuss its implications in Sect. 3.5.

3.4.4 Causal Inference

Regression models are also standard tools in causal inference. While randomized
experiments are the gold-standard method for determining causal effects, causal
inference in sociology usually has to rely on observational data, especially for ethical
reasons. In causal inference, we typically establish a treatment variable (A) and an
outcome variable of interest (Y ), use the potential outcome framework to derive
causal estimands, and use a directed acyclic graph (DAG) to explain the data
generation process and how confounding may occur (Morgan & Winship, 2014).
Then, we specify confounding variables (L ) sufficient to identify the causal effects,
use confounding variables as controls in the regression model, and estimate the
coefficient of the treatment variable. Of course, there are many other methods for
estimating causal effects in addition to conventional regression analysis (matching,
inverse probability of treatment weighting, g-estimation, g-formula, etc.). For the
selection of covariates to be used as control variables, it is recommended that
(1) variables that affect one or both the treatment and the outcome are included,
(2) those that can be considered instrumental variables are excluded, and (3) those
that serve as proxies of unobserved variables that are common causes of the
treatment and the outcome are entered into the model as covariates (VanderWeele,
2019). To avoid discrepancy between the estimated regression coefficients and the
intended effect, we must consider what is a good control variable and what is a bad
control variable (Cinelli et al., 2021). The selection of variables does not aim to
increase the predictive or explanatory power of the model but to reduce or not
amplify bias. In this respect, regression analysis for causal inference differs signif-
icantly from that for description and prediction. The determination of which vari-
ables are sufficient to identify a causal effect must be made by a human based on
theory, prior research, domain knowledge, etc. Even if the data-generating process is
clarified and confounding variables to identify the causal effect are obtained, linear
regression analysis may not be appropriate for causal inference. This is because the
38 H. Takikawa and S. Fujihara

relationships among covariates, outcomes, and treatments may not be adequately


modeled. Misspecification of the relationships (functional forms) among these vari-
ables may lead to biased results (Schuler & Rose, 2017). In cases where there are
many variables or the relationships among variables are complex, machine learning
methods can be used to build flexible models of the outcome and treatment. Clearly,
theoretical considerations about covariates, treatments, and outcomes are necessary.
Nevertheless, it is also essential to think in a data-driven manner about which
covariates to include in the final analysis and what associations to assume among
the covariates (Mooney et al., 2021). Although the purpose of prediction is different
from that of causal estimation, predictive tasks are often an intermediate step in
causal estimation (Naimi & Balzer, 2018). In addition to estimating propensity
scores (Westreich et al., 2010), machine learning plays a significant role in
g-computation, which uses predictions under interventions to identify causal effects
(Le Borgne et al., 2021). Machine learning is used to both estimate flexible treatment
and outcome models in causal inference. Recently, as a doubly robust method for
estimating causal effects, targeted maximum likelihood estimation using the ensem-
ble method (SuperLeaner), which combines predictions from several machine learn-
ing algorithms, has attracted considerable attention (Schuler & Rose, 2017; Van der
Laan & Rose, 2011). Thus, statistical causal inference already incorporates machine
learning methods rather than relying solely on conventional regression techniques.
Further application of machine learning to causal inference methods will be
discussed in Sect. 3.6.

3.4.5 Problems of the Conventional Regression Model

Regression analysis is widely used because it is a simple method of statistical


analysis, and the results obtained are transparent (Kino et al., 2021) and (at first
glance) easy to interpret. In addition, since regression is covered in textbooks on
statistics and quantitative methods in all fields, even researchers who do not spe-
cialize in quantitative research know how to read and interpret the results of
regression analysis. The interpretability of the estimates is essential in communicat-
ing with researchers who are not familiar with statistical methods (Kino et al., 2021).
However, regression analysis in social science research has been subject to substan-
tial criticism (Abbott, 1988; Achen, 2005; Morgan & Winship, 2014).
Although regression analysis is undoubtedly a useful tool for a variety of research
purposes, including description, prediction, and causal inference, the objectives of a
research study, theory, hypothesis, and research question tend to be defined within
the statistical model, such as estimating and showing significant regression coeffi-
cients, which can lead to not adopting the best approach to the research question or to
narrowing the scope of the study too much (Lundberg et al., 2021). Although
regression analysis methods may be used to test theories and hypotheses, these
methods are unlikely to lead to the construction of a new theory or theoretically
meaningful target quantity.
3 Methodological Contributions of Computational Social Science to Sociology 39

Regression analysis is used in many sociological studies to examine the associ-


ation between x and y, controlling for other variables. In descriptive regression
analysis with many independent variables, the interpretation of each coefficient
(association or effect) can be difficult in practice because a distinction between
treatment variables, confounding variables, mediating variables, and collider vari-
ables may not be made (Acharya et al., 2016; Keele et al., 2020; Westreich &
Greenland, 2013). Inadequate use of control variables can also create biases or
magnify existing biases (Cinelli et al., 2021). These coefficients do not tell us how
y changes when x changes from one value to another (Morgan & Winship, 2014);
they do not tell us information about the causal effect and the effect of the interven-
tion. Regression analysis with a set of independent variables and covariates may be
performed to estimate the causal effects of one or more variables, but usually, this
does not yield causal effects. As mentioned earlier, causal inference with regression
analysis requires different procedures and more specialized knowledge than does
regression analysis for description and prediction (Hernán et al., 2019).
Linear regression analysis has also been noted as a problematic method of
description, data reduction, and smoothing (Berk, 2004). Linear assumptions fail
to capture social reality, and more flexible models must be applied if reality is to be
better described (Abbott, 1988). Additionally, interpretation of the estimated coef-
ficients of a linear model may appear straightforward, but it is not meaningful if
the model is not correctly specified both theoretically and empirically. Although the
coefficient that is the best fit in the sample is estimated, it may not necessarily be the
best fit out of the sample.
Regression analysis is insufficient as an exploratory method because the proce-
dure is not automated and requires a significant amount of researcher discretion,
leading to a higher likelihood of p-hacking, low transparency, and poor reproduc-
ibility (Brand et al., 2021). This problem is also related to the fact that in quantitative
sociological studies, exploratory methods are undervalued, and the exploratory
approach has not been fully developed. This undervaluing of the heuristic approach
has resulted in the underdevelopment of exploratory regression analysis with less
researcher discretion (Molina & Garip, 2019, p. 39). Adopting a heuristic rather than
confirmatory approach can also be useful in identifying heterogeneous causal
effects. Therefore, an exploratory approach to identify heterogeneous causal effects
using machine learning was proposed (Brand et al., 2021) and will be discussed in
more detail later.
As mentioned above, regression analysis and related methods have been widely
used in sociological studies. However, in some cases, these conventional methods
may not be effective in meeting the three main goals of data analysis: description,
prediction, and causal inference (Berk, 2004; Hernán et al., 2019). To ensure the
proper use of regression analysis, it is important to consider various factors. Machine
learning, while primarily used for prediction, can also be useful for description and
causal inference (Grimmer et al., 2021). It has the potential to address some of the
limitations of traditional regression analysis for these purposes.
40 H. Takikawa and S. Fujihara

3.5 Machine Learning in Sociology

There are still few applications of machine learning in sociology (see also Molina &
Garip, 2019 for applications in sociology). In this section, we review specific
applications of machine learning by introducing the application of the prediction
framework and automated coding in sociology. In the next section, we address a
more advanced topic: the deployment potential of machine learning in relation to the
elucidation of causal mechanisms.

3.5.1 Application of the Prediction Framework

As an example of an application of supervised machine learning in sociology, let us


discuss life course prediction using panel data by Salganik et al. (2020). They used
panel data from the Fragile Families and Child Wellbeing Study, which included
several thousand U.S. families that had given birth to a child around the year 2000,
with data from Waves 1–6 collected at the time of the study. They applied a common
task framework commonly used in machine learning to challenge a large number of
researchers with the following prediction task. The challenge was to predict various
life outcomes measured in Wave 6 (e.g., child GPA and termination of primary
caregiver) using information about children and families from Waves 1–5. Follow-
ing a supervised machine learning framework, the Wave 6 data were split into
training data and test data (holdout data). Researchers who applied to the challenge
used the training data to train their proposed models, which were then evaluated on
the test data.
Despite a large number of participants (160 teams ultimately submitted models),
including researchers, the challenge was not very accurate in predicting life out-
comes. Although up to several thousand variables were available from Waves 1 to
5, even the best of the submitted models only marginally outperformed very simple
linear and logistic regression models using only four variables in terms of predictive
accuracy. This failure may have occurred because the survey data set lacked
important variables that were originally needed for prediction or because social
life is inherently too complex and unpredictable.
What, then, is the significance for sociology of tackling the problem of predicting
life outcomes? First, they seek its significance in identifying families and children at
risk. This is a policy and practical significance that is easy to understand. Second,
there may be new developments in analytical techniques through the prediction task,
which has methodological significance. Finally, there are implications for sociolog-
ical theory: according to Salganik et al., if predictability varies by social context, it
stimulates the development of sociological theory to consider why this is so.
We provide a more detailed explanation of the relationship between prediction
and sociological theory by noting that the results of Salganik et al. show that
prediction does not work uniformly well for all subjects but rather works reasonably
3 Methodological Contributions of Computational Social Science to Sociology 41

well for many subjects, with a small number of cases that cannot be predicted using
any model. The overall tendency is that the predictions work reasonably well for
many subjects, with a few cases that cannot be predicted using any model. This may
stimulate the development of sociological theory by asking questions such as why
some cases are not well predicted, what theory can explain the poor predictions, and
what information not included in the survey should be focused on if an explanation is
to be attempted [cf. This may stimulate theoretical development (cf. Garip, 2020)].
Additionally, although different from the present case, suppose that a complex
model involving a variety of higher-order interactions is somewhat successful in
making predictions. This model has good predictive performance but does not have
the easily interpretable structure of an uncomplicated regression model. Therefore,
the sociologist must reinterpret and understand the structure of the model to under-
stand why this complex model can predict social reality to some extent. This will
require the development of new sociological theories, which opens the possibility of
developing sociological theory in a different way through the task of prediction by
machine learning. In other words, sociological theory can be constructed in a way
that improves the interpretability of models through moderately complex models that
capture predictable aspects of the real world, rather than the overly complex, noise-
filled real world itself. This is also an idea that leads to the scientific regret
minimization method, which will be introduced later.

3.5.2 Automatic Coding

Another application of machine learning in sociology is automatic coding. In the


digital society, the share of data that directly record actions, the so-called found data,
such as digital trace data and archived text data, is increasing (Salganik, 2018). Such
data pose new challenges for sociological research. A particularly large problem is
what Salganik calls the incompleteness problem (Salganik, 2018), which can be
distinguished from the missing variable problem and the construct validity problem.
In the case of structured data, such as surveys, researchers can prepare questions
that allow them to measure the concepts (variables) they are interested in. Ideally, the
items are also operationalized in advance to adequately measure the constructs.
However, in the case of found data that are not generated by the researcher, the
researcher may not have the data to operationalize the concepts of interest to him or
her, or if he or she does, he or she may still face challenges in operationalizing them
appropriately.
One way to address the issue of missing variables is to combine found and survey
data. The following is an introduction to a procedure that Salganik refers to as
amplified asking (Salganik, 2018). As a research case study, we focus on the study
by Blumenstock et al. (2015). Their interest was to determine the distribution of
wealth in Rwanda. They had log data from cell phones used by the majority of
Rwandans, but they did not explicitly include information about income or wealth.
Therefore, they conducted a survey asking a subset of cell phone users about their
42 H. Takikawa and S. Fujihara

income and wealth. By doing so, they obtained data (X, y) where X is the cell phone
log data and y is the income or wealth level. This can be regarded as supervised data
with label y for X. By training the model with this labeled data (X, y) and applying it
to the remaining unlabeled “labeled” cell phone log data, they assigned income and
wealth information to all data. Of course, the accuracy of the model depends on how
it is constructed since the information is extrapolated from the cell phone log data,
except for the information actually asked in the survey.
Conversely, the problem of construct validity (Lazer, 2015) arises when linking
the given found data to the construct y. For example, if one compares the theoretical
construct of intelligence as measured by an established test, the Raven progressive
matrices test, or by the criterion of writing long sentences on Twitter, the former
could be considered much more valid (Salganik, 2018). While these issues always
arise with survey data, found data are not created for research, so how to conceptu-
alize the data becomes an even greater issue.
The problem of how to create sociologically meaningful concepts from found
data, which is not designed for research, is not unique to digital trace data. Tradi-
tional methods of content analysis using newspapers, books, and magazines address
the same problem (Krippendorff, 2004). Traditionally, in these fields, researchers
interpret texts (and sometimes photos and videos) and assign codes that represent
sociological concepts. Although the problem of construct validity remains with such
methods, it is highly likely that a certain degree of validity can be ensured by coding
based on flexible human interpretation. However, this method is extremely expen-
sive and has limited scalability. Therefore, the idea of replacing some or all manual
coding with automatic coding by machines has emerged based on today’s large-scale
digital trace data and found data. The question then arises as to how machine-
automated coding can satisfy construct validity.
One method of machine coding is partial automatic coding by supervised
machine learning. The main framework is the same as in amplified asking. We
have unlabeled found data X. This can be a newspaper article (Nelson et al., 2021) or
an image posted on a social networking site (Zhang & Pan, 2019). For a subset of
this X, a sociological construct y is first manually labeled by the researcher. The
model is trained and evaluated using the data set (X, y) created in this way. Once a
model with sufficient performance is trained, it can be used to automatically assign
codes to the remaining data sets.
Constructs in sociology can be very complex and multidimensional, for example,
populism, social capital, and inequality. Nelson et al. (2021) conducted experiments
to test the validity of measuring complex concepts in sociology in an automatic
coding framework. Specifically, they manually coded inequality and its subconcepts
and related concepts in news articles that may contain the concept of inequality and
then used them as yardsticks to examine the extent to which partially automatic
coding by supervised machine learning matches manual coding. The results show
that supervised machine learning is capable of coding with a good degree of validity,
with the F1 score, which is the harmonic mean of precision and recall, exceeding
the guideline of 70. More interestingly, however, is the possibility that examining
the idiosyncrasies and biases of machine coding may also lead to a rethinking of the
3 Methodological Contributions of Computational Social Science to Sociology 43

human manual coding framework. Nelson et al. note that it is important to consider
the extent to which theoretically interesting categories can be categorized in terms of
precision and recall and to choose a coding framework based on this. To extend this
point further, it may be necessary to review the coding rules themselves so that
machines can construct theoretically interesting categories that are easier to classify.
As noted earlier, machine learning models are also formalized models of human
cognition and judgment, so the fact that they have a reflexive relationship to human
coding is particularly important when applied to complex, “socially constructed”
concepts handled by sociology.
Nelson et al. (2021) add to this by examining the possibility of fully automating
coding through unsupervised machine learning. For example, unsupervised machine
learning, such as topic models, can automatically identify potential topics that a
given data X addresses. By examining the extent to which such automatic assign-
ment of topics matches the categories assigned by humans, we can examine the
possibility of automatic coding via unsupervised machine learning. Conclusively, it
is difficult for unsupervised machine learning to reproduce a classification that
corresponds exactly to a human predefined concept. Therefore, it would be a mistake
to assume that automatic coding by unsupervised machine learning can completely
replace manual coding by humans. Rather, the potential of unsupervised machine
learning coding lies in its ability to discover new classifications. While machines
cannot perform coding as flexibly as humans can, they are free from the biases and
narrowness of vision inherent in humans and may discover latent patterns and
propose new classifications that humans were previously unaware of. Of course,
the usefulness of such classifications must be determined by humans from the
standpoint of sociological theory. Adding machine “interpretations” to human
interpretations in this way may enable concept formation that is also beneficial in
advancing sociological theory.

3.6 Toward Further Applications in Sociology

As we have seen in the previous section, there are various possibilities for the
application of machine learning to sociology. In this section, we continue to examine
how applications of machine learning can contribute to the development of socio-
logical theory, and in particular, we introduce two methods in relation to the goal of
sociology, which is to elucidate causal mechanisms.

3.6.1 Heterogeneity of Causal Effects

Like other social sciences such as economics and political science, sociology has
paid great attention to the causal factors that cause social phenomena. Moreover, the
main theoretical goal of sociology is to elucidate the causal mechanisms of social
44 H. Takikawa and S. Fujihara

phenomena. When we speak of mechanisms, we mean not only the mere connec-
tions between causes and effects but also the elucidation of mechanisms at a deeper
level that link causes and effects (Hedström & Ylikoski, 2010; Machamer et al.,
2000; Morgan & Winship, 2014). Well known in sociology is the micro–macro
mechanism elucidation research program formulated by Coleman (1990). This
research program aims to elucidate the mechanisms for macro collective social
phenomena from a more micro, action level.
How can machine learning be used to elucidate such causal mechanisms? The
identification of causal effects itself requires the use of a statistical causal inference
framework, which is largely outside the current scope of machine learning. How-
ever, identifying the heterogeneity of causal effects is an important step toward a
better understanding of causal mechanisms (Salganik, 2018). Heterogeneity of
causal effects refers to the fact that causal effects vary by situation, context, and
attributes of the intervention target. Machine learning is extremely powerful in the
search for such heterogeneity.
When searching for effect heterogeneity using the traditional hypothesis testing
approach, one is confronted with a variety of problems. First, prior theoretical
preconceptions and conventions dictate at what level effect heterogeneity is likely
to exist. For example, gender, age, and socioeconomic status are preferred variables
in sociology. This in itself is not necessarily a bad thing, but there is a risk that the
scope of inquiry of sociological theory is narrowed beforehand (it is not limited to
gender, age, and socioeconomic status). In addition, there may be a widespread
practice of feeding various interaction terms into regression models and reporting the
results of models that incorporate only those interaction terms that actually become
significant, which amounts to clear p-hacking (Brand et al., 2021). Finally, hetero-
geneity is not always adequately captured by first-order interactions in regression
models. It is quite possible that it is caused by second-order or higher interactions or
even more complex nonlinear mechanisms (cf. Molina & Garip, 2019). However, it
is difficult to consider such possibilities with existing quantitative methods (Brand
et al., 2021).
Athey and Imbens (2016) developed a combination of machine learning and
causal inference called causal trees, which is a way to address these issues and is of
great use to sociology. Causal trees apply the method of decision trees in machine
learning to model the heterogeneity of causal effects in an exploratory manner and
without overfitting. Decision trees are a method of constructing a tree for data with
covariates and target variables by partitioning the covariates with the goal of
predicting a certain target variable (Hastie et al., 2009). The tree consists of multiple
nodes in a hierarchical structure. At the first node (sometimes called the root), data
are split into two subnodes based on a threshold value for a given covariate. Splitting
is performed in such a way that the values of the target variables are most similar
within each node. Then, at each node, a further division based on the covariate
threshold is made in a similar manner. This process is repeated to construct the final
tree. The decision tree construction is highly transparent because the algorithm is
relatively simple. It is also easy to interpret visually.
3 Methodological Contributions of Computational Social Science to Sociology 45

In a causal tree, the goal is to predict the treatment effect τ instead of the target
variable. As in the usual decision tree, the tree is constructed in such a way that the
heterogeneity within the nodes of τ is reduced. However, unlike the usual target
variable, the treatment effect τ is a potential outcome and not an observable quantity.
Specifically, Athey and Imbens (2016) developed a procedure called honest estima-
tion. In “honest” estimation, the sample is split into data for partitioning the covariate
space and data for estimating treatment effects within nodes. The partitioning of the
nodes is set up in such a way that the heterogeneity of the treatment effects is
captured as much as possible, while the uncertainty in the treatment effect estimation
is minimized. In general, the finer the split, the greater the heterogeneity between
nodes, while the estimation of the treatment effect within a node becomes more
uncertain. In other words, there is a tradeoff between the two goals, and the goal is to
make the partition in such a way that they are just balanced.
An example of an application of causal trees in sociology is the work of Brand
and colleagues (Brand et al., 2021). They use NLSY panel data to examine the extent
to which a college degree is effective in reducing the time spent in low-wage jobs.
Analysis with causal trees allows them to find not only the average causal effect of a
college degree in reducing time in a low-wage job but also heterogeneous causal
effects and the extent to which these effects vary across people with particular
attributes. Moreover, using the tree enables them to examine not only linear effects
but also complex interactions of various factors. The results of their analysis indicate
unexpected heterogeneity due to such complex interactions. The effect of a college
degree on the reduction of low-wage work was particularly large for those whose
mothers were less educated, grew up in large families, and had less social control.
Such unexpected findings provide an opportunity to further explore the causal
mechanisms and lead to further development of sociological theory.
Mediation analysis is another method for mechanism exploration (VanderWeel,
2015). Currently, the causal mediation analysis method is constructed from the
perspective of causal inference. The quantities of interest (estimand) include direct
intervening to set the mediator variable M to m as well as the treatment A to a, or the
direct and indirect effects of setting that the mediator variable M to a natural value
(Ma) after the treatment A is set to a. Conditions for the identification of these various
direct and indirect effects have been examined, and several methods of estimation
have been developed. Machine learning methods are useful in causal mediation
analysis, just as they are useful in causal inference. In addition, to understand the
effects of treatment variables that change over time, it is necessary to think carefully
about the estimand, identification, and estimation. Machine learning methods can
also be used for estimation in this context (Lendle et al., 2017; van der Laan and
Rose, 2018).
46 H. Takikawa and S. Fujihara

3.6.2 Scientific Regret Minimization Method

The greatest problem with machine learning’s emphasis on prediction is that the
theoretical interpretability of the results is limited. In other words, the internal
mechanisms through which machine learning models produce good predictions are
unknown. Nevertheless, the interpretability of a conventionally simple model does
not mean that a simple model should be chosen at the expense of predictability
(Yarkoni & Westfall, 2017). Rather, the fact that a model is predictable, even if it is a
complex model, can in principle be considered to mean that there exists the possi-
bility of theorizing in it and thus the possibility of making the model interpretable.
Therefore, a methodology is needed to build an interpretable social science theory
while preserving the predictive performance of machine learning models to the
fullest extent possible. This can be positioned as a methodology for integrated
modeling that aims to integrate predictive and explanatory capabilities (Hofman
et al., 2021).
For example, it is said that the coefficients of a linear regression model can be
easily interpreted by combining the results into a single quantity, in contrast to
machine learning models, which often lack a single interpretable quantity and can be
difficult to understand. However, the average partial effect, which is a measure of the
“effect” of a particular variable on the outcome of interest, can be obtained using
machine learning predictions and interpreted in a similar way to coefficients in
traditional regression analysis. If we want to know how much a partial change in
one variable x will change y on average, holding other variables constant, we can
compute them directly from the predictions. For example, to find the average partial
effect, we can take the differences between the predicted value of y for two different
values of x (x and x + Δ) and then divide that difference by Δ and average them
(Lundberg et al., 2021). If we clearly define the target quantity we wish to obtain, it
can be calculated from the predictions. If we prioritize ease of interpretation, we can
choose a quantity that is easily understood. Defining a clear and easily interpretable
target quantity can help to address many of the challenges associated with
interpreting the results of machine learning. When a target quantity is well defined
and meaningful from a theoretical perspective, it can be easier to understand and
draw meaningful conclusions from the prediction by machine learning.
In addition, the “scientific regret minimization” proposed by Agrawal et al. is
considered a promising methodology (Agrawal et al., 2020; Hofman et al., 2021).
This method seeks to improve social science models by preparing large-scale data
and using machine learning methods to focus only on variances that can be explained
in principle. Variances that can be explained in principle are those that could have
been explained by a better model. In contrast, the inability to explain inherent noise
is not a problem; rather, changing the model to accommodate the noise will lead to
overfitting and loss of generalization performance.
Specifically, the following steps should be taken:
1. Train a theoretically unconstrained machine learning model (black-box model) on
a large data set to identify explainable variances in the data set.
3 Methodological Contributions of Computational Social Science to Sociology 47

2. Fit a simple, interpretable psychological model to the same data set.


3. Compare the black-box model with the simple model and improve the simple
model.
4. If the predictions of both models are consistent, we have obtained a model that
maximizes predictive and explanatory power simultaneously.
5. Validate the model obtained from the above exploratory analysis with new
independent data.
This method can be understood as a method of sequentially improving the model by
alternating between the data and the model. The conventional method also focuses
on the divergence between the model’s predictions and the data, and the process is to
improve the model in the direction of closing the divergence. The difference between
the traditional residual analysis and the scientific regret minimization method is that
the former compares data to a social science model, while the latter compares a
black-box model to a social science model.
Let us denote the true function as f(x), the machine learning model as f ðxÞ, and
the social science model as g(x). The goal of social science is to make the social
science model as close to the true function as possible, that is, to minimize the
difference f(x) - g(x) (the true residual) between the two. Nevertheless, since we
cannot know the true model, we cannot know the true residuals either. Therefore, in
conventional residual analysis, the model is successively modified to minimize the
residual y - g(x) (“raw residual”) between the data and the social science model. In
contrast, the scientific regret minimization method focuses on the residual f ðxÞ -
gðxÞ (“smoothed residual”) between the machine learning model f ðxÞ and the social
science model g(x), rather than data y itself. The reason for this is that the larger the
data, the more likely it is that the smoothed residuals reflect the true residuals rather
than the raw residuals [see the current paper of Agrawal et al. (2020) for proof].
Conversely, attempting to reduce the raw residuals would result in overfitting the
model to the noise, which would lead to overfitting.
What specific theories could the scientific regret minimization method produce?
Agrawal et al. apply this method to a large data set of moral machine experiments
(Awad et al., 2018) to propose a more detailed and interpretable moral theory than
previously possible. In another study (Peterson et al., 2021), this approach is applied
to the domain of risky decision making to derive a modified model for expected
utility theory and prospect theory (Kahneman & Tversky, 1979).
In sociology, with “scientific regret minimization,” or more generally, with
integrated modeling that aims to integrate predictive and explanatory capabilities,
it should be possible to perform interpretable theory discovery and theory building to
advance the development of sociological theory.
48 H. Takikawa and S. Fujihara

3.7 Conclusion

Today, the data environment surrounding sociology is changing drastically. Accord-


ingly, it is necessary for sociological methods to incorporate new methods, in
addition to traditional methods, in response to changes in the data environment.
Although machine learning differs greatly from traditional sociological methods in
culture, basic ideas, and logic, it has great potential for the development of socio-
logical theory. In particular, machine learning has the potential to contribute to
sociological theory in three ways. First, it can break away from the traditional
deductive model and incorporate more flexible and heuristic ideas. Second, the
machine learning cognitive paradigm offers new cognitive norms that differ from
the traditional sociological norms embodied in hypothesis testing. This improves the
status quo in terms of replicability and generalizability. Third, machine learning
models can provide models of human cognition and judgment.
Existing statistical methods in sociology have been concerned with description,
causal inference, and prediction, but machine learning methods can approach these
issues better or from new angles. In this chapter, we demonstrate the effectiveness of
using machine learning methods for prediction, coding, and causal inference with
real research examples. Nevertheless, there is a major challenge in using machine
learning for the development of sociological theory: the problem of interpretability.
Conventional machine learning models do not necessarily focus on interpretability,
but by using ideas such as scientific regret minimization, the possibility is now open
to build more interpretable models that can be used with sociological theory. This is
a valuable chance to shift focus from interpreting the coefficients estimated from a
model to constructing various theoretical quantities based on predictions, thereby
broadening our perspective on analyzing data. We conclude that sociology could
achieve healthier development by incorporating machine learning methods, in addi-
tion to traditional statistical methods, into its toolbox.

References

Abbott, A. (1988). Transcending general linear reality. Sociological Theory, 6(2), 169–186. https://
doi.org/10.2307/202114
Acharya, A., Blackwell, M., & Sen, M. (2016). Explaining causal findings without bias: Detecting
and assessing direct effects. American Political Science Review, 110(03), 512–529. https://doi.
org/10.1017/S0003055416000216
Achen, C. H. (2005). Let’s put garbage-can regressions and garbage-can probits where they belong.
Conflict Management and Peace Science, 22(4), 327–339. https://doi.org/10.1080/
07388940500339167
Agrawal, M., Peterson, J. C., & Griffiths, T. L. (2020). Scaling up psychology via scientific regret
minimization. Proceedings of the National Academy of Sciences, 117(16), 8825–8835.
Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete.
Wired magazine, 16(7), 16–07.
3 Methodological Contributions of Computational Social Science to Sociology 49

Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial
intelligence: An agenda (pp. 507–547). University of Chicago Press.
Athey, S., & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects. Pro-
ceedings of the National Academy of Sciences, 113, 7353–7360.
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan,
I. (2018). The moral machine experiment. Nature, 563(7729), 59–64.
Berk, R. A. (2004). Regression analysis: A constructive critique. Sage.
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine learning. Springer.
Blumenstock, J. E., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile
phone metadata. Science, 350(6264), 1073–1076. https://doi.org/10.1126/science.aac4420
Brand, J. E., Xu, J., Koch, B., & Geraldo, P. (2021). Uncovering sociological effect heterogeneity
using tree-based machine learning. Sociological Methodology, 51(2), 189–223.
Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the
author). Statistical Science, 16(3), 199–231.
Cinelli, C., Forney, A., & Pearl, J. (2021). A crash course in good and bad controls. SSRN
Electronic Journal. https://doi.org/10.2139/ssrn.3689437
Coleman, J. S. (1990). Foundations of social theory. Belknap Press of Harvard University Press.
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the
ACM, 55(10), 78–87.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics,
26(4), 745–766.
Elwert, F., & Winship, C. (2010). Effect heterogeneity and bias in main-effects- only regression
models. In R. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality:
A tribute to Judea Pearl (pp. 327–336). Joseph Y. Halpern.
Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of
culture. Poetics, 68, 144–154.
Garip, F. (2020). What failure to predict life outcomes can teach us. Proceedings of the National
Academy of Sciences, 117(15), 8234–8235.
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models.
Cambridge University press.
Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University
Press.
Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative
research. Aldine De Gruyter.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Grimmer, J., Messing, S., & Westwood, S. J. (2017). Estimating heterogeneous treatment effects
and the effects of heterogeneous treatments with ensemble methods. Political Analysis, 25(4),
413–434. https://doi.org/10.1017/pan.2017.15
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An
agnostic approach. Annual Review of Political Science, 24, 395–419.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine
learning and the social sciences. Princeton University Press.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical
learning: Data mining, inference, and prediction. Springer.
Hedström, P., & Ylikoski, P. (2010). Causal mechanisms in the social sciences. Annual Review of
Sociology, 36(1), 49–67. https://doi.org/10.1146/annurev.soc.012809.102632
Hernán, M. A., Hsu, J., & Healy, B. (2019). A second chance to get causal inference right: A
classification of data science tasks. Chance, 32(1), 42–49. https://doi.org/10.1080/09332480.
2019.1579578
50 H. Takikawa and S. Fujihara

Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., Margetts, H.,
Mullainathan, S., Salganik, M. J., Vazire, S., & Vespignani, A. (2021). Integrating explanation
and prediction in computational social science. Nature, 595(7866), 181–188.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning.
Springer.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects.
Science, 349(6245), 255–260.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47(2), 263–292.
Keele, L., Stevenson, R. T., & Elwert, F. (2020). The causal interpretation of estimated associations
in regression models. Political Science Research and Methods, 8(1), 1–13. https://doi.org/10.
1017/psrm.2019.31
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196–217.
Kino, S., Hsu, Y.-T., Shiba, K., Chien, Y.-S., Mita, C., Kawachi, I., & Daoud, A. (2021). A scoping
review on the use of machine learning in research on social determinants of health: Trends and
research prospects. SSM Population Health, 15, 100836. https://doi.org/10.1016/j.ssmph.2021.
100836
Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction policy problems.
American Economic Review, 105(5), 491–495.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.). Sage.
Lazer, D. (2015). Issues of construct validity and reliability in massive, passive data collections.
The City Papers: An Essay Collection from The Decent City Initiative.
Le Borgne, F., Chatton, A., Léger, M., Lenain, R., & Foucher, Y. (2021). G-computation and
machine learning for estimating the causal effects of binary exposure statuses on binary out-
comes. Scientific Reports, 11(1), 1435.
Lendle, S. D., Schwab, J., Petersen, M. L., & van der Laan, M. J. (2017). ltmle: An R package
implementing targeted minimum loss-based estimation for longitudinal data. Journal of Statis-
tical Software, 81(1), 1–21. https://doi.org/10.18637/jss.v081.i01
Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What is your estimand? Defining the target
quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565.
Machamer, P., Darden, L., & Craver, C. F. (2000). Thinking about Mechanisms. Philosophy of
Science, 67(1), 1–25. https://doi.org/10.1086/392759
McFarland, D. A., Lewis, K., & Goldberg, A. (2016). Sociology in the era of big data: The ascent of
forensic social science. The American Sociologist, 47(1), 12–35.
Mizuno, M., & Takikawa, H. (2022). Computational social science on the structure of communi-
cation between consumers (Yoshida foundation report).
Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45,
27–45.
Mooney, S. J., Keil, A. P., & Westreich, D. J. (2021). Thirteen questions about using machine
learning in causal research (you won’t believe the answer to number 10!). American Journal of
Epidemiology, 190(8), 1476–1482. https://doi.org/10.1093/aje/kwab047
Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference: Methods and
principles for social research (2nd ed.). Cambridge University Press.
Mullainathan, S., & Spiess, J. (2017). Machine learning: An applied econometric approach. Journal
of Economic Perspectives, 31(2), 87–106.
Naimi, A. I., & Balzer, L. B. (2018). Stacked generalization: An introduction to super learning.
European Journal of Epidemiology, 33(5), 459–464. https://doi.org/10.1007/s10654-018-
0390-z
Nelson, L. K., Burk, D., Knudsen, M., & McCall, L. (2021). The future of coding: A comparison of
hand-coding and three types of computer-assisted text analysis methods. Sociological Methods
& Research, 50(1), 202–237.
3 Methodological Contributions of Computational Social Science to Sociology 51

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-
scale experiments and machine learning to discover theories of human decision-making.
Science, 372(6547), 1209–1214.
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Salganik, M. J., Lundberg, I., Kindel, A. T., Ahearn, C. E., Al-Ghoneim, K., Almaatouq, A.,
Altschul, D. M., Brand, J. E., Carnegie, N. B., Compton, R. J., & Datta, D. (2020). Measuring
the predictability of life outcomes with a scientific mass collaboration. Proceedings of the
National Academy of Sciences, 117(15), 8398–8403.
Schuler, M. S., & Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in
observational studies. American Journal of Epidemiology, 185(1), 65–73. https://doi.org/10.
1093/aje/kww165
Schutz, A., & Luckmann, T. (1973). The structures of the life world. Northwestern University Press.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed
flexibility in data collection and analysis allows presenting anything as significant. Psycholog-
ical Science, 22, 1359–1366. https://doi.org/10.1177/0956797611417632
Tavory, I., & Timmermans, S. (2014). Abductive analysis: Theorizing qualitative research. Uni-
versity of Chicago Press.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society: Series B (Methodological), 58, 267–288.
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. Journal of
the Royal Statistical Society: Series B (Statistical Methodology), 73, 273–282. https://doi.org/
10.1111/j.1467-9868.2011.00771.x
Van der Laan, M. J., & Rose, S. (2011). Targeted learning. Springer.
van der Laan, M. J., & Rose, S. (2018). Targeted learning in data science: Causal inference for
complex longitudinal studies. Springer International Publishing. https://doi.org/10.1007/978-3-
319-65304-4
VanderWeele, T. J. (2015). Explanation in causal inference: Methods for mediation and interac-
tion. Oxford University Press.
VanderWeele, T. J. (2019). Principles of confounder selection. European Journal of Epidemiology,
34(3), 211–219. https://doi.org/10.1007/s10654-019-00494-6
Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology,
120(2), 313–351.
Westreich, D., & Greenland, S. (2013). The table 2 fallacy: Presenting and interpreting confounder
and modifier coefficients. American Journal of Epidemiology, 177(4), 292–298. https://doi.org/
10.1093/aje/kws412
Westreich, D., Lessler, J., & Funk, M. J. (2010). Propensity score estimation: Neural networks,
support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic
regression. Journal of Clinical Epidemiology, 63(8), 826–833.
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology. Perspec-
tives on Psychological Science, 12(6), 1100–1122.
Zhang, H., & Pan, J. (2019). Casm: A deep-learning approach for identifying collective action
events with text and image data from social media. Sociological Methodology, 49(1), 1–57.
Chapter 4
Computational Social Science: A Complex
Contagion

Michael W. Macy

Computational social science is a multidisciplinary umbrella that includes a wide


array of research practices enabled by advances in computation. These include
computer simulation of social interaction on complex networks; collecting,
processing, and analyzing digital trace data from online communities; data wran-
gling with massive numbers of incomplete and partially structured observations;
machine learning; text analysis and natural language processing; geospatial data
collection and analysis; and tracking global diffusion across online networks.
The importance of social network analysis in computational social science created
an opportunity for Sociology to become the disciplinary home of a game-changing
field. This book addresses the discipline’s curious reluctance to embrace that oppor-
tunity and offers a compelling explanation: the need for a deeper theoretical grounding
for the questions that drive the research agenda of computational social science.
In a 2014 paper in the Annual Review of Sociology, Scott Golder and I pointed
instead to methodological challenges and called for changes in graduate training
across the social sciences. “A primary obstacle to online research by social scien-
tists,” we argued, “is the need for advanced technical training to collect, store,
manipulate, analyze, and validate massive quantities of semi-structured data, such
as text generated by hundreds of millions of social media users. In addition,
advanced programming skills are required to interact with specialized or custom
hardware, to execute tasks in parallel on computing grids composed of hundreds of
nodes that span the globe, and simply to ensure that very large amounts of data
consume memory efficiently and are processed using algorithms that run in a
reasonable amount of time. As a consequence, the first wave of studies of online
behavior and interaction has been dominated by physical, computer, and information

M. W. Macy (✉)
Department of Sociology and Department of Information Science, Cornell University, Ithaca,
NY, USA
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 53
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_4
54 M. W. Macy

scientists who may lack the theoretical grounding necessary to know where to look,
what questions to ask, or what the results may imply” (Golder & Macy, 2014,
p. 144).

4.1 The First Wave

Although Scott and I referred to “the first wave,” the study of online interaction in
global networks should more appropriately be termed “the second wave” of com-
putational social science. The first wave began decades earlier and involved the use
of computational models to explore the logical implications of a set of theoretical
propositions. Most notably, the seminal work by Schelling (1971) and Axelrod
(1984) examined the dynamics of residential segregation and the evolutionary
origins of social order, questions that are central not only to Sociology but to Social
Psychology and Political Science as well. In this chapter, I recount my personal
involvement in computational social science over five decades, focusing on two
themes: the technical obstacles that had to be overcome, and the foundational
research questions that have motivated the field.
My initial foray into what came to be known as computational social science
dates back to my junior year in college and exemplifies the early fascination with
abstract computational models. My mentor, Karl Deutsch, was one of the first social
scientists to apply simulation, information theory, and system dynamics models to
the study of war and peace. One afternoon Prof. Deutsch walked in to our weekly
tutorial with a rumpled copy of The Prisoner’s Dilemma: A Study in Conflict and
Cooperation by Rapaport and Chammah (1965). He handed me the book and told
me to come back when I had finished reading it. The book focused on the dynamics
of cooperation in iterated play and introduced computer simulation of stochastic
learning models and human subject experiments to test model predictions. In the
words of Martin Shubik (1970, p. 193), the book “adopts a completely different
approach and starts to do precisely what I believe is necessary—to enlarge the
framework of the analysis to include learning and sociopsychological factors,
notably reciprocity.” Rapaport went on to win Robert Axelrod’s famous prisoner’s
dilemma tournament by submitting the simplest of all strategies entered: “tit for tat.”
Rapoport and Chammah’s experiments inspired me to see if I could use computer
simulation to dig deeper into their learning-theoretic approach. Suppose the players
are unaware of the game’s mathematical structure and instead apply a simple
stochastic propensity to repeat behaviors associated with a satisfactory outcome
and otherwise explore an alternative. Would the players learn to cooperate?
To find out, I asked Prof. Deutsch if I could simulate an iterated two-person PD
game, with each player using an identical stochastic strategy based on reinforcement
learning. The stochastic learning model was exceedingly simple. The players have a
single goal: to use their own behavior to induce the other player to cooperate. Each
player begins by flipping a coin to choose whether to cooperate or defect and then
4 Computational Social Science: A Complex Contagion 55

updates its cooperative propensity based on satisfaction with the associated behav-
ior. Players are satisfied when the other player cooperates and dissatisfied when they
defect.
The problem was that my school still relied on an IBM mainframe, programmed
using punch cards. I would have to punch the cards with lines of code, stand in line
waiting to submit the deck, wait hours to get back the paper output, learn that I had
made a mistake, punch a new card, and resubmit the deck. Worse yet, the prohibitive
cost discouraged the exercise of curiosity when funded by a small grant. After a few
frustrating nights, I gave up. I complained about the problem to a friend, Jed Harris,
who worked at Apt Associates, not far from campus. Jed told me that Apt had
installed the latest DEC PDP-7, a computer that would allow me to observe, debug,
and quickly modify the game dynamics as they played out in real time. Jed let me use
the machine at night, after work. I could accomplish in one evening what would have
taken me weeks on the mainframe, and it cost nothing to explore the parameter
space.
The simulations revealed a self-reinforcing cooperative equilibrium that could
quickly recover from small perturbations introduced by occasional defections.
However, the stochastic stability of mutual cooperation depended on the learning
rate. Highly reactive players would quickly learn to cooperate and could easily
recover if one of the players were to test the other’s resolve. In contrast, slow
learners were doomed to endless cycles of recrimination, retaliation, and mutual
defection. I wrote up the results for Prof. Deutsch and put the paper with his
encouraging comments in a cardboard box, where it remained for 20 years.

4.2 Agent-Based Modeling

Fast forward two decades to when Peter Hedstrom called me up from the University
of Chicago where he was helping James Coleman launch a new journal, Rationality
and Society. Peter reminded me about my old prisoner’s dilemma simulation, which
I had mentioned to him back when we were in grad school together and asked me to
update the paper and submit it for the journal’s second issue. The paper (Macy,
1989) introduced what later came to be known as “agent-based modeling,” a new
approach to theoretical research that replaced “a model of a population” with “a
population of models,” where each model corresponds to an autonomous agent
interacting with its neighbors.
The point I want to underscore is that the roots of computational social science go
back to pioneers like Karl Deutsch, Anatol Rapaport, and Albert Chammah. How-
ever, the field had to wait 20 years for a new generation of personal computers to
catch up with the advances in social theory that first inspired me as an undergraduate.
Recognizing the opportunities opened up by universal access to desktop com-
puting, Bill Bainbridge organized a national meeting supported by the National
Science Foundation on “Grand Computing Challenges for Sociology” (Bainbridge,
56 M. W. Macy

1994). John Skvoretz and I were both in attendance and were inspired by
Bainbridge’s call. Our model of the diffusion of trust and cooperation among
strangers (Macy & Skvoretz, 1998) was followed up by a PNAS paper with Mitch
Sato on trust and market formation in the USA and Japan (Macy & Sato, 2002).
I also collaborated with Andreas Flache to leverage the rapidly increasing power
of desktop machines to explore the theoretical implications of “backward-looking”
alternatives to the forward-looking rationality assumed in classical game theory.
This culminated in a paper in the Annual Review of Sociology, “Beyond Rationality
in Theories of Choice” (Macy & Flache, 1995). However, the impact of agent-based
modeling extended far beyond our learning-theoretic approach. Robb Willer and I
later co-authored a paper in the Annual Review of Sociology (Macy & Willer, 2002)
that highlighted a fundamental analytical shift made possible by agent-based model-
ing, from “factors” (interactions among variables) to “actors” (interactions among
agents embedded in social networks).

4.3 Social Contagion

The shift in the focus of computational social science—from collective action to


network interaction—was consolidated with the discovery of small world networks
by Watts and Strogatz (1998). As a graduate student at Cornell, Watts was intrigued
by an empirical puzzle posed by Stanley Milgram in the 1960s—the “six degrees of
separation” popularized as the “Kevin Bacon Number.” This is the startling hypoth-
esis that any two randomly chosen people are connected to one another by a mere
handful of intermediates. How is this possible among the billions of people scattered
across the planet, each embedded in a small circle of friends and family, and many
living in remote towns and villages? Watts and Strogatz found the answer: it takes
only a small number of bridge ties between structurally distant communities to give
highly clustered networks the short mean geodesic of a random graph.
The discovery of small world networks inspired my paper with Damon Centola
on the diffusion dynamics of “complex contagions” (Centola & Macy, 2007). Watts
and Strogatz modeled “simple contagions,” which entail transmission from a single
prior adopter, as occurs in the spread of pathogens or viral information. For example,
if you are exposed to Omicron BA.2, you do not need to be infected by a second
individual in order to acquire the disease. However, that is not the case if you want to
know whether to protest against public health regulations or to participate in a risky
collective action or whether to adopt an expensive but unproven innovation. Com-
plex contagions have higher adoption thresholds, that is, they require social rein-
forcement from multiple prior adopters. In a paper published in AJS (Centola &
Macy, 2007), Damon and I enumerated four reasons why social reinforcement may
be necessary:
1. Strategic Complementarity: Simply knowing about an innovation is rarely suffi-
cient for adoption (Gladwell, 2000). Many innovations are costly, especially for
4 Computational Social Science: A Complex Contagion 57

early adopters but less so for those who wait. The same holds for participation in
collective action. Studies of strikes (Klandermans, 1988), revolutions (Gould,
1996), and protests (Marwell & Oliver, 1993) emphasize the positive externalities
of each participant’s contribution. The costs and benefits for investing in public
goods often depend on the number of prior contributors—the “critical mass” that
makes additional efforts worthwhile.
2. Credibility: Innovations often lack credibility until adopted by neighbors. For
example, Coleman et al. (1966) found that doctors were reluctant to adopt
medical innovations until they saw their colleagues using it. Markus (1987)
found the same pattern for adoption of media technology. Similarly, the spread
of urban legends (Heath et al., 2001) and folk knowledge (Granovetter, 1978)
generally depends upon multiple confirmations of the story before there is
sufficient credibility to report it to others. Hearing the same story from different
people makes it seem less likely that surprising information is nothing more than
the fanciful invention of the informant. The need for confirmation becomes even
more pronounced when the story is learned from a socially distant contact, with
whom a tie is likely to be relationally weak.
3. Legitimacy: Knowing that a movement exists or that a collective action will take
place is rarely sufficient to induce bystanders to join in. Having several close
friends participate in an event often greatly increases an individual’s likelihood of
also joining (Finkel et al., 1989; Opp & Gern, 1993), especially for high-risk
social movements (McAdam & Paulsen, 1993). Decisions about what clothing to
wear, what hair style to adopt, or what body part to modify are also highly
dependent on legitimation (Grindereng, 1967). Non-adopters are likely to chal-
lenge the legitimacy of the innovation and innovators risk being shunned as
deviants, until there is a critical mass of early adopters (Crane, 1999; Watts,
2002).
4. Emotional Contagion: Most theoretical models of collective behavior—from
action theory (Smelser, 1963) to threshold models (Granovetter, 1973) to cyber-
netics (McPhail, 1991)—share the basic assumption that there are expressive and
symbolic impulses in human behavior that can be communicated and amplified in
spatially and socially concentrated gatherings (Collins, 1993). The dynamics of
cumulative interaction in emotional contagions has been demonstrated in events
ranging from acts of cruelty (Collins, 1974) to the formation of philosophical
circles (Collins, 1993).
The theory of complex contagion can be understood as an extension of
Granovetter’s theory of the strength of weak ties (Granovetter, 1973). According
to Granovetter, ties are weak relationally when there is infrequent interaction and/or
low emotional salience, as in relations with an acquaintance. However, these ties can
nevertheless be structurally strong in that they provide access to information from
outside one’s immediate circle. In network terminology, relationally weak ties often
have long range, meaning that they connect nodes located in structurally distant
network neighborhoods. Damon and I showed that the structural strength of long-
range ties is limited to simple contagions that do not require social reinforcement. In
58 M. W. Macy

contrast, the spread of complex contagions depends on “wide bridges” composed of


densely interwoven ties to multiple sources of influence.
The reinforcement of social influence is half the story. The other half is how
homophily paves the way for wide bridges through the tendency for ties to form and
strengthen among like-minded neighbors. Just as social influence can be reinforced
in densely clustered networks, homophily is the magnetic force that pulls network
nodes together into the dense local clusters required by complex contagion. As with
magnetic force, there is repulsion as well as attraction, but unlike ferromagnetics, in
social magnetics it is the opposites that repel.
Years before my collaboration with Damon, James Kitts, who was my first
graduate student, had worked with me to develop a computational model that
combined social influence with the network dynamics of attraction and repulsion.
Our work drew heavily on John Hopfield’s model of recurrent neural networks
(Hopfield, 1982), and James honored our forebear by naming our model “the
Hopster.” The model shows how political and cultural polarization can be a unique
global attractor, yet other stable configurations are also possible, depending on the
density of the belief matrix. These include monoculture and the cross-cutting
divisions of a pluralist society.

4.4 From Latte Liberals to Naked Emperors

The Hopster, it turned out, had much more to teach us about the self-reinforcing
dynamics of homophily and social influence. Daniel DellaPosta, Yongren Shi, and I
used a descendant of the original model to address the curious tendency for liberals
and conservatives to differ not only on policy but also lifestyle preferences, as
documented using the General Social Survey (DellaPosta et al., 2015). However,
our underlying theoretical motivation ran deeper: to show how belief systems can
self-organize through the forces of attraction and repulsion. The emergent configu-
rations invite post-hoc explanations of cultural fault lines that can be as substantively
idiosyncratic as a liberal preference for caffeinated hot beverages, an idea originally
proposed by Miller McPherson (2004).
A few years later, Sebastian Deri, Alex Ruch, Natalie Tong, and I (2019) tested
the “latte liberal” hypothesis using a large online experiment, modeled after the
multiple worlds “music lab” devised by Salganik et al. (2006). The results of our
“party lab” experiment confirmed the predicted unpredictability of partisan divi-
sions. The emergent disagreements were as deep as those observed in contemporary
surveys, but with one important difference: You could be sure that the two parties
would strongly disagree, but it was a coin flip as to who would be on which side. In
one “world,” Democrats might join the bandwagon to embrace “great books,” while
Republicans rallied around more emphasis on children’s physical fitness, but in the
next world the sides would be switched. The problem is that social scientists, like the
participants in our study, can only observe the one world we all inhabit. That leaves
us susceptible to “just so” stories that plausibly explain the opposing beliefs of each
4 Computational Social Science: A Complex Contagion 59

side, unaware that the sides could just as easily have been switched but for the luck
of the draw.
The arbitrariness of partisan division invites the reassuring hypothesis that polit-
ical polarization can be easily reversed simply by reminding everyone that the
emperor is naked. Unfortunately, it is not so easy, for two reasons—false enforce-
ment and hysteresis. In a study with Robb Willer and Damon Centola (2005), we
simulated Andersen’s classic fable to show how conformists might falsely enforce
unpopular norms to avoid suspicion that they might have complied because of social
pressure instead of genuine conviction. In a follow-up study with Ko Kuwabara,
Robb and I tested the “false enforcement” hypothesis in a wine-tasting experiment
using vinegar-tainted wine (Willer et al., 2009). As predicted by Andersen (as well
as Arthur Miller’s The Crucible), participants who praised the tainted wine were
more likely to criticize the lone confederate who refused to go along with the
charade—but only when the criticism was performed in public.
Polarization may be hard to reverse even in the absence of false enforcement. In
collaboration with a team of computer scientists at RPI, I recently used another
Hopster variant to investigate the tipping point beyond which a polarized legislature
becomes increasingly unable to unite against common threats, such as election
interference by a foreign adversary or a global pandemic (Macy et al., 2021). The
problem is hysteresis, in which polarization alters the network structure by elimi-
nating the inter-party ties by which pragmatists might “reach across the aisle.” The
structural change is difficult if not impossible to reverse, even if the political
temperature could somehow be lowered well below current levels. The disturbing
implications attracted widespread media attention, including the New York Times
and CNN.

4.5 The Second Wave

If the “first wave” in computational social science was all theory and little data, the
“second wave” was the mirror opposite: big data with little theory. Social science has
accumulated a trove of theories waiting for the data that are needed to test them. The
transformative potential of online data was celebrated by one of the founders of
computational social science, Duncan Watts (2011, p. 266):
[J]ust as the invention of the telescope revolutionized the study of the heavens, so too by
rendering the unmeasurable measurable, the technological revolution in mobile, Web, and
Internet communications has the potential to revolutionize our understanding of ourselves
and how we interact. . . . [T]hree hundred years after Alexander Pope argued that the proper
study of mankind should lie not in the heavens but in ourselves, we have finally found our
telescope. Let the revolution begin.

Is the Web the space telescope of the social sciences? The metaphor is instructive.
The power of a telescope, whether in outer space or cyberspace, depends on our
ability to know where to point it. Moreover, with millions of observations in global
60 M. W. Macy

networks, the challenge is to find differences that are not statistically significant.
Theoretical significance then becomes paramount.

4.6 Structural Holes and Network Wormholes

My first study using big data addressed Ron Burt’s theory of structural holes. Nathen
Eagle, Rob Claxton, and I used call logs from one of the UK’s largest telecoms to test
Ron’s theory at population scale (Eagle et al., 2010). As predicted, we found that
economically advantaged communities tended to have more people with ties that
link otherwise distantly connected neighbors, although the causal direction remained
to be sorted out.
More recently, Patrick Park, Josh Blumenstock, and I (2018) used these same
telecom data, along with global network data from Twitter, to search the social
heavens for “network wormholes,” our term for long-range ties that span vast
distances in a global communications network. These were not the “wide bridges”
in complex contagion; in contrast, we were searching for Granovetter’s “weak ties,”
the “long bridges” that connect otherwise unreachable clusters. Not surprisingly, we
found these ties to be extremely rare. For any random edge in a global network, the
“degree of separation” along the second-shortest path is almost always close to two
hops. Nevertheless, a handful of long-distance “shortcuts” can also be found in the
global communication networks made visible by social media. The question then
arises, are these shortcuts strong enough to matter? The default assumption in
network science, going back to Granovetter, is that they are relationally weak. The
“strength of weak ties,” in Granovetter’s theory, is the access they provide to socially
distant sources of information, not their affective or social intensity. Long-range ties,
or so the theory goes, connect acquaintances with whom interaction is infrequent,
influence is low, and bandwidth is narrow. But that is not what we found. Contrary to
extant theory, network wormholes have nearly the same bandwidth and affective
content as the densely clustered ties that connect a small circle of friends.
Another big empirical study was motivated by years of computational modeling
with the Hopster. Yongren Shi, Feng Shi, Fedor Dokshin, James Evans, and I used
millions of Amazon book co-purchases to see if partisan cultural fault lines extended
even to the consumption of science, a realm that is presumably above the political
fray (2017). Could a shared interest in science bridge political differences and
encourage reliance on science to inform political debate? Or has science become a
new battlefield in the culture wars? We found that the political left and right share an
interest in science in general, but not science in particular. Liberals are drawn more
to basic science (e.g., physics, astronomy, and zoology), while conservatives prefer
applied science (e.g., criminology, medicine, and geophysics). Liberals read science
books that are more often purchased by people who do not buy political books, while
conservatives prefer science books that are mainly purchased by fellow
conservatives.
4 Computational Social Science: A Complex Contagion 61

The most impactful paper of my career was a 2011 study with Scott Golder in
which we used millions of messages obtained from Twitter to measure diurnal
emotional rhythms (Golder & Macy, 2011). We sorted the tweets into 168 buckets,
one for each hour of the week, and then used Pennebaker’s Linguistic Inquiry and
Word Count lexicon to measure the level of positive and negative emotion in each
bucket. Positive emotion is indicated by the use of words like “excited” or “fun,” in
contrast to words like “disappointed” or “frustrated.” We found that people were
happiest right around breakfast, but for the rest of the workday it was all downhill.
We immediately suspected the effects of an exhausting job ruled by an unpleasant
supervisor, but then we noticed the same pattern on weekends as well, except that the
starting point was delayed by about 2 h, reflecting perhaps the opportunity to sleep
in. Using changes in diurnal rhythms relative to sunrise and sunset, we concluded
that the pattern is driven largely by sleep cycles, not work cycles. The paper was
published in Science and attracted more mainstream media attention than all my
other papers combined.
A decade later, Minsu Park and I (with the help of company staffers) used global
Spotify music logs to track the flip side of the Twitter study—the emotions people
are exposed to in the music they choose to stream instead of the emotions they
express (Park et al., 2019). We filtered out Spotify playlists to focus on user-selected
music the world over. We discovered that the diurnal pattern in affective preference
closely resembles the diurnal cycles that Scott and I detected in expressed emotion.
This suggests the possibility that our affective preferences reinforce rather than
compensate our emotional state. For example, when we are sad we do not go for
upbeat music to lift our spirits, we listen to something melancholy. Unfortunately,
Minsu and I were unable to link user’s Spotify and Twitter accounts, so our affective
reinforcement theory remains to be tested at the individual level.

4.7 Conclusion

This brief overview of my personal involvement in the first and second waves of
computational social science is intended to call attention to the foundational ques-
tions that the studies addressed, from the origins of social order to the strength of
weak ties in global online networks. There is no shortage of theory in the social
sciences, and from its inception, computational social science has ranked among the
more theory-driven fields. In contrast, there has been a shortage of data with which to
test many of those theories, due largely to the historic difficulty in observing social
interaction except in small groups. That is now changing with the global popularity
of online activities that leave digital traces, from shopping to blogging. Nevertheless,
the social sciences have not taken full advantage of the vast new research opportu-
nities opened up by advanced computational methods. The question is why?
I do not believe this hesitancy should be attributed to the failure to ask interesting
and important questions. On the contrary, the signature contribution of computa-
tional social science is the opportunity to tackle vital questions that would otherwise
62 M. W. Macy

be inaccessible. I have not run the numbers but my casual impression is that these
studies are far more likely to appear in general science journals with double-digit
impact scores than in highly specialized journals devoted to topics that interest only a
narrow audience.
The problem is not the disciplinary relevance of the research; I suspect it is
instead the price of admission. Rapid advances in computation have been accompa-
nied by equally rapid turnover in the requisite technical skills, from object oriented
programming that super-charged agent-based modeling to deep learning and word
embedding that have opened up new frontiers in text analysis. These methods
require substantial retooling, even for quantitative specialists with advanced statis-
tical training.
The updating of graduate training that Scott Golder and I called for in our 2014
Annual Review paper remains to be implemented at scale in any of the social
sciences. Until that happens, computational social science is likely to remain con-
fined largely to those who have the necessary skills. The torch will then continue to
be carried mostly by computer scientists and socio-physicists who may be more
interested in discovering unexpected patterns in the data than discovering what they
might mean. Meanwhile, our best option is the increasing reliance on interdisciplin-
ary research teams that bring together specialists who not only know how to operate
the telescope but also where to point it.

References

Axelrod, R. (1984). The evolution of cooperation. Basic Books.


Bainbridge, W. (1994). Grand computing challenges for sociology. Social Science Computer
Review, 12, 183–192. https://doi.org/10.1177/089443939401200203
Centola, D., & Macy, M. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113, 702–734. https://doi.org/10.1086/521848
Centola, D., Willer, R., & Macy, M. (2005). The emperor’s dilemma: A computational model of
self-enforcing norms. American Journal of Sociology, 110, 1009–1040. https://doi.org/10.1086/
427321
Coleman, J., Katz, E., & Menzel, H. (1966). Medical innovation: A diffusion study. Bobbs-Merrill.
Collins, R. (1974). Three faces of cruelty: Towards a comparative sociology of violence. Theory
and Society, 12, 631–658.
Collins, R. (1993). Emotional energy as the common denominator of rational action. Rationality
and Society, 5, 203–230.
Crane, D. (1999). Diffusion models and fashion: A reassessment. Annals of the American Academy
of Political and Social Science, 566, 13–24.
DellaPosta, D., Shi, Y., & Macy, M. (2015). Why do liberals drink lattes? American Journal of
Sociology, 120, 1473–1511.
Eagle, N., Macy, M., & Claxton, R. (2010). Network diversity and economic development. Science,
328, 1029–1031. https://doi.org/10.1126/science.1186605
Finkel, S., Muller, E., & Opp, K. (1989). Personal influence, collective rationality, and mass
political action. American Political Science Review, 83, 885–903.
Gladwell, M. (2000). The tipping point: How little things can make a big difference. Little, Brown.
Golder, S., & Macy, M. (2011). Diurnal and seasonal mood vary with work, sleep, and daylength
across diverse cultures. Science, 333, 1878–1881. https://doi.org/10.1126/science.1202775
4 Computational Social Science: A Complex Contagion 63

Golder, S., & Macy, M. (2014). Digital footprints: Opportunities and challenges for online social
research. Annual Review of Sociology, 40, 129–152. https://doi.org/10.1146/annurev-soc-
071913-043145
Gould, R. (1996). Patron-client ties, state centralization, and the whiskey rebellion. American
Journal of Sociology, 102, 400–429. https://doi.org/10.1086/230951
Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78, 1360–1380.
Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology,
83, 1420–1443.
Grindereng, M. (1967). Fashion diffusion. Journal of Home Economics, 59, 171–174.
Heath, C., Bell, C., & Sternberg, E. (2001). Emotional selection in memes: The case of urban
legends. Journal of Personality and Social Psychology, 81, 1028–1041. https://doi.org/10.1037/
0022-3514.81.6.1028
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational
abilities. National Academy of Sciences of the United States of America, 79, 2554–2558. https://
doi.org/10.1073/pnas.79.8.2554
Klandermans, B. (1988). Union action and the free-rider dilemma. Research in Social Movements,
Conflict and Change, 10, 77–92.
Macy, M. (1989). Walking out of social traps: A stochastic learning model for the Prisoner’s
dilemma. Rationality and Society, 1, 197–219.
Macy, M., & Flache, A. (1995). Beyond rationality in theories of choice. Annual Review of
Sociology, 21, 73–91.
Macy, M., & Sato, Y. (2002). Trust, cooperation, and market formation in the U.S. and Japan.
Proceedings of the National Academy of Sciences, 99, 7214–7220.
Macy, M., & Skvoretz, J. (1998). The evolution of trust and cooperation between strangers: A
computational model. American Sociological Review, 63, 638–660.
Macy, M., & Willer, R. (2002). From factors to actors: Computational sociology and agent-based
modeling. Annual Review of Sociology, 28, 143–166. https://doi.org/10.1146/annurev.soc.28.
110601.141117
Macy, M., Deri, S., Ruch, A., & Tong, N. (2019). Opinion cascades and the unpredictability of
partisan polarization. Science Advances, 5, eaax0754. https://doi.org/10.1126/sciadv.aax0754
Macy, M., Ma, M., Tabin, D., Gao, J., & Szymanski, B. (2021). Polarization and tipping points.
Proceedings of the National Academy of Sciences, 118, e2102144118. https://doi.org/10.1073/
pnas.2102144118
Markus, M. (1987). Toward a ‘critical mass’ theory of interactive media: Universal access,
interdependence and diffusion. Communication Research, 14, 491–511. https://doi.org/10.
1177/009365087014005003
Marwell, G., & Oliver, P. (1993). The critical mass in collective action. Cambridge University
Press. https://doi.org/10.1017/CBO9780511663765
McAdam, D., & Paulsen, R. (1993). Specifying the relationship between social ties and activism.
American Journal of Sociology, 99, 640–667. https://doi.org/10.1086/230319
McPhail, C. (1991). The myth of the madding crowd. Aldine.
McPherson, M. (2004). A Blau space primer: Prolegomenon to an ecology of affiliation. Industrial
and Corporate Change, 13, 263–280.
Opp, K., & Gern, C. (1993). Dissident groups, personal networks, and spontaneous cooperation:
The East German Revolution of 1989. American Sociological Review, 58, 659–680. https://doi.
org/10.2307/2096280
Park, P., Blumenstock, J., & Macy, M. (2018). The strength of long-range ties in population-scale
social networks. Science, 362, 1410–1413.
Park, M., Thom, J., Mennicken, S., Cramer, H., & Macy, M. (2019). Global music streaming data
reveal diurnal and seasonal patterns of affective preference. Nature Human Behaviour, 3,
230–236. https://doi.org/10.1038/s41562-018-0508-z
Rapaport, A., & Chammah, A. (1965). Prisoner’s dilemma: A study in conflict and cooperation.
The University of Michigan Press.
64 M. W. Macy

Salganik, M., Dodds, P., & Watts, D. (2006). Experimental study of inequality and unpredictability
in an artificial cultural market. Science, 311, 854–856.
Schelling, T. (1971). Dynamic models of segregation. The Journal of Mathematical Sociology, 1,
143–186. https://doi.org/10.1080/0022250x
Shi, F., Shi, Y., Dokshin, F., Evans, J., & Macy, M. (2017). Millions of online book co-purchases
reveal partisan differences in the consumption of science. Nature Human Behaviour, 1, 79.
https://doi.org/10.1038/s41562-017-0079
Shubik, M. (1970). Game theory, behavior, and the paradox of the prisoner’s dilemma: Three
solutions. Journal of Conflict Resolution, 14, 181–193.
Smelser, N. (1963). Theory of collective behavior. Free Press.
Watts, D. (2002). A simple model of global cascades on random networks. Proceedings of the
National Academy of Sciences, 99, 5766–5771.
Watts, D. (2011). Everything is obvious: Once you know the answer. Crown Business.
Watts, D., & Strogatz, S. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Willer, R., Kuwabara, K., & Macy, M. (2009). The false enforcement of unpopular norms.
American Journal of Sociology, 115, 451–490. https://doi.org/10.1086/599250
Chapter 5
Model of Meaning

Hiroki Takikawa and Atsushi Ueshima

5.1 Introduction

Meaning is a fundamental component of the social world (Luhmann, 1995; Schutz &
Luckmann, 1973; Weber, 1946). People inhabiting the social world interpret the
meaning of natural objects in their environment and social objects, including others,
and act based on these interpretations. If we call the mechanism by which people
interpret the meanings of objects and other people’s actions and link them to their
own actions the meaning-making mechanism (Lamont, 2000), then social science,
which aims to explain the behavior of people and groups, must, as its fundamental
task, elucidate this meaning-making mechanism as its fundamental task.
In sociology, since Weber (1946) focused on subjective meaning in his definition
of the discipline, considerations related to meaning-making mechanisms—consid-
erations about the relationship between meaning and human action—have accumu-
lated. In terms of methods for elucidating meaning-making mechanisms, meaning
and culture have traditionally been considered qualitative in nature, as in German
sociologist GeistesWissenschaft’s position (Dilthey, 1910), which is closely related
to the establishment of Weber’s sociology. Therefore, although there are exceptions,
approaches to meaning have primarily been attempted through qualitative social
theory and qualitative research.
On the contrary, rational choice theory, the most influential formal theory of
action in sociology, has initially viewed meaning-making mechanisms in an

H. Takikawa (✉)
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]
A. Ueshima
Graduate School of Arts and Letters, Tohoku University, Sendai, Japan

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 65
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_5
66 H. Takikawa and A. Ueshima

extremely simplistic manner and by means of a narrowly defined self-interest


maximization principle. Specifically, early rational choice theories positioned the
meaning-making of surrounding objects and the actions of others solely in terms of
self-interest maximization (Becker, 1976; Coleman, 1990). However, it has become
clear that theories based on such narrow assumptions have significant limitations in
capturing the richer meaning-making mechanisms of people targeted by sociology.
Because of these recognized limitations, rational choice theories in sociology have
been developed by incorporating various “explanatory factors,” such as nonmaterial
benefits and altruistic considerations, into actors’ utilities and purposes (Opp, 1999).
However, such sociological rational choice theories tend to be at risk of
degenerating into mere storytelling, producing models that lack predictive power
(Watts, 2014). The main reason for this is overfitting. Overfitting is a machine
learning term that refers to overexplaining what “fits” the data at hand, thereby
losing explanatory power in more general cases (see Chap. 3). In general, the more
the model is loosened (the more complex it becomes) to allow for rich a posteriori
“explanations” that make a given action understandable, the more prone it is to
overfitting. If early rational choice theories are underfitting, recent sociological
rational choice theories have the potential to fall into overfitting models that lack
predictive ability.
The problem here is clear. It is the absence of a formal meaning-making model
that is superior in terms of predictive capacity based on empirical evidence. There
are several reasons for this absence. In particular, data on the semantic and cultural
aspects of human action have traditionally been qualitative, making formal exami-
nation difficult. However, with the recent development of digital society, a break-
through has been achieved in this regard. The large volume of digital texts contained
in digital traces and archives offers unprecedented opportunities for quantitative
access to the semantic world of actors (Evans & Aceves, 2016; Grimmer & Stewart,
2013). In addition, natural language processing and artificial intelligence models that
analyze texts not only serve as powerful analytical tools but can also provide clues
for building theoretical models of meaning-making mechanisms (Blei, 2012;
Mikolov et al., 2013). Of course, the target of sociology is the meaning of social
action in general, not limited to the meaning of texts or natural language. However,
since most social actions are mediated by language, it would not be incorrect to
assume that there is at least some common meaning-making mechanism between the
interpretation of social actions and of textual meaning. Therefore, it seems promising
to construct a theoretical model of meaning-making based on findings in the field of
natural language processing, where data and models are the most developed. Such a
movement has emerged in recent years, particularly in cultural sociology (Arseniev-
Koehler & Foster, 2022; Boutyline & Soter, 2021; Foster, 2018; Kozlowski et al.,
2019). In addition, throughout the chapter, we emphasize the link between interpre-
tation and action, which is not necessarily the focus of the current natural language
processing literature.
Existing computational social sciences have little motivation to contribute to
sociological theory by providing theoretical insights and empirical examinations of
meaning-making mechanisms through the quantitative analysis of texts. Therefore,
5 Model of Meaning 67

this chapter examines whether the sociological elucidation of meaning-making


mechanisms using textual data and artificial intelligence models is possible, and if
so, in what direction exploration should proceed.
The outline of this chapter is as follows: In the next section, we review existing
theories of meaning in sociology. We summarize the key points of existing theories
and argue that with the introduction of large-scale textual data and cognitive science,
the conditions are now in place to enable the refinement of sociological theories of
meaning through computational social science. We then formulate meaning-making
as a computational problem and characterize it with three key points: the predictive
function of meaning, the relationality of meaning, and Bayesian learning. We then
discuss the utility of two computational linguistic models—the topic model and the
word-embedding model—in terms of theories of meaning. For these models of
meaning to be valid as sociological models of meaning production, it is necessary
to demonstrate that meaning interpretation motivates actual action. In the preceding
section, we discussed ways to link semantic representations and actions. The final
section concludes the chapter.

5.2 Theories of Meaning in Sociology

In this section, we briefly review sociological theory of meaning.

5.2.1 Weber’s Social Action

According to Weber’s (1946) definition, sociology is “a science which attempts the


interpretive understanding of social action to arrive at a causal explanation of its
course and effects.” In this definition, action refers to human behavior based on
subjective meaning. In this way, Weber established the goal of sociology to causally
explain an action by interpreting the meaning attached to it. [The work of Simmel,
1922, who, prior to Weber, placed the problem of understanding meaning at the
center of sociology, should not be ignored].
The great significance of Weber’s definition is that he clearly points out the
examination of meaning-making mechanisms as a central issue in sociology. Fur-
thermore, also important from today’s perspective is Weber’s recognition of the
significance of the distinction between an understanding of action and the verifica-
tion of the validity of that understanding through causal explanation. No matter how
persuasive a meaning attribution may be, it will not work as a sociological explana-
tion unless it can causally explain the actor’s action. This is very important from
today’s perspective in terms of avoiding “overfitting.” It is also important to clarify
that the interpretation of meaning in sociology is only done for the purpose of
causally explaining actions.
68 H. Takikawa and A. Ueshima

In addition, Weber proposed an ideal type of device for such an explanation. Ideal
types should be constructed, in Weber’s words, “meaning-adequate” and should be
conducive to causal explanation, which is in line with the argument in this chapter
that we should attempt to explain actions and collective consequences by formaliz-
ing actors’ meaning-making mechanisms.

5.2.2 Schutz’s Phenomenological Sociology

Starting from the critique of Weberian interpretative sociology, Schutz made several
important insights into the meaning structure of the social world, which greatly
influenced today’s schools of social constructionism (Berger & Luckmann, 1967)
and ethnomethodology (Garfinkel, 1967). Schutz’s (Schutz & Luckmann, 1973)
analysis focused on the structure of the everyday life-world, which is taken for
granted by people in terms of their natural attitudes. The typification theory is
particularly important. According to Schutz, actors in the world are both constrained
by it and act and intervene in it, and pragmatic motives underlie their meaning-
making. People usually carry out their actions without delay based on typified
knowledge about things and people. In this way, the knowledge of the everyday
world is self-evident. However, when actions based on existing knowledge are
confronted with problematic situations, the knowledge and typologies previously
considered self-evident are questioned and reinterpreted. Under such dynamism, the
meaning of the social world is constituted.
Schutz’s typification theory is important in that it explicitly argues that our social
world is semantically constituted through knowledge and typologies, and that they
are not only given but are also socially constituted in that they are constantly being
questioned and revised. Schutz’s argument also excels in the coherence of its logical
organization. Although he himself does not propose a formal model, he recognizes
the importance of using models in the social sciences, and his arguments have a
logical structure that is relatively comfortable with formal models. As such, his
typification theory has a remarkable commonality in logical structure with the
formally formulated Bayesian theory of categorization (Anderson, 1991), which
we discuss later.

5.2.3 Bourdieu’s Cognitive Sociology

Bourdieu followed the theoretical tradition of Durkheim (1915), who sought to


clarify the social origins of people’s perceptions, classifications, and categories
(Bourdieu, 1984). Bourdieu is known for his theories of social class, cultural capital,
and habitus, but recently, his cognitive sociological aspects have attracted attention
(Lizardo, 2004, 2019). He believes that there is a fundamental correspondence
between the objective structure of the social world, such as social class, and the
5 Model of Meaning 69

psychological structure of the actors who perceive and categorize the social world,
which is produced in the course of struggling to adopt their social positions
(Bourdieu, 1989).
The greatest legacy of Bourdieu’s argument is twofold: First, Bourdieu not only
proposed a conceptual apparatus for treating cognition sociologically but also
articulated a methodology for analysis using multiple correspondence analysis,
which allows for the spatial arrangement of different variables and the examination
of their positional relationships. Second, he argues that this method enables us to
spatially represent the social world as a social space, as well as the world of symbols
and meanings as a semantic space, and to discuss the correspondence between
the two.
Despite its limitations, Bourdieu’s method has had a direct impact on today’s
quantitative cultural sociology in that it paved the way for a quantitative approach to
meaning and, more specifically, suggested the possibility of meaning being
expressed through spatial structures.

5.2.4 White and DiMaggio’s Sociology of Culture

Thus far, we have introduced Weber, Schutz, and Bourdieu’s classical theories of
meaning. While these can serve as important inspirations for developing models of
meaning, with the exception of Bourdieu, these sociologists themselves do not
directly propose formal models. In contrast, White and DiMaggio, whose work is
discussed next, developed ideas that directly lead to formal models of meaning and
quantification of meaning. Their arguments can be credited by providing a basis for
today’s computational sociology of culture.
Early in his career, H. White (1963) paved the way for a quantitative treatment of
culture by proposing an extended formal model of kinship structure by Levi-Strauss
and Weil. His next step was to formalize the most important sociological concept of
roles by means of an algebraic theory of networks, in which White invented the
concept of “catnet” and drew attention to the fact that the semantic category “cat,”
which refers to people or groups, is inseparably linked to the way “net” connects
these people (White, 2008b; White et al., 1976). Later, White attempted to theorize
people’s meaning-making and network formation mechanisms based on Bayesian
ideas by which identity is committed to control in an uncertain world, but he failed to
establish an explicit formal model (White, 1992, 1995, 2008a). Nevertheless, he has
had a profound impact on many of the central researchers in cultural sociology
today, such as Mohr, Bearman, and DiMaggio, who will be discussed shortly.
Another important figure of the quantitative approach in cultural sociology today
is DiMaggio, who published a programmatic review article in 1997 on the integra-
tion of cultural sociology and cognitive science (DiMaggio, 1997). He criticized
traditional sociological theories for failing to deal explicitly with cognition while
implicitly making some assumptions about cognition and called for the active
introduction of theories from cognitive and cognitive psychology to explore the
70 H. Takikawa and A. Ueshima

mechanisms of meaning-making. He also laid the groundwork for the construction


of quantitative semantic models in sociology, pointing to the kinship between
sociological models of meaning and models of meaning in computational linguistics,
leading to the early introduction of topic models in sociology (DiMaggio et al.,
2013).

5.2.5 In Summary: Semantic Models in Cultural


Sociology Today

In the classic discussions in sociology by Weber, Schutz, and others, the approach to
the problem of meaning was almost entirely qualitative. However, thanks to the
efforts of White, DiMaggio, and others, today’s cultural sociology is beginning to
quantitatively approach meaning. Of particular importance are:
1. The expansion of data sources, including large-scale textual data, has led to the
development of computational social scientific methods.
2. In conjunction with (1), ideas from cognitive psychology and cognitive science
have permeated sociology.
These two opportunities have led to the establishment of a healthy practice in
cultural sociology by modifying theories based on empirical data while using formal
theoretical models as a foundation (Mohr, 1998; Mohr et al., 2020).
In this chapter, we will continue to follow this trend and examine the possibilities
of a sociological model of meaning-making. For this purpose, we will examine
models that have been developed mainly in computational linguistics and discuss the
possibility of connecting them to a sociological model of meaning. In the next
section, we would like to make some preliminary considerations that will serve as
preparation for the examination of models in computational linguistics.

5.3 Preparatory Considerations on the Semantic Model

A key to connecting the sociological model of meaning-making with the model of


computational linguistics is to view meaning-making as a computational problem
(cf. Foster, 2018). In other words, we apply the idea of computation to the theory of
meaning itself, not merely as a technique for analyzing the given data. If meaning-
making is viewed as a computational problem, it can be formulated as the activity of
extracting meaningful information from an uncertain and noisy environment (Foster,
2018). Thus, it is concerned with the problem of inferring the potential “meaning”
from noisy data. From the data given to our senses (e.g., ink stains, objects with color
and shape, the physical actions of others), meaning-making as a computational
problem is to extract the “meaning” behind them.
5 Model of Meaning 71

There are multiple reasons that meaning-making should be framed as a compu-


tational problem. First, there is the prospect that many of the valuable insights into
different theorizations of meaning that sociology has attempted thus far can be
modeled in a unified way under the idea of computation. Second, using the idea of
computation, we can directly draw on various computational linguistics and artificial
intelligence models that have already been developed to theorize semantic phenom-
ena. This eliminates the need to start from scratch to build a formal meaning model.
Third, if we can construct a model of meaning from a computational perspective, we
can also obtain an organic link between a method for analyzing semantic phenomena
in sociology and a model for theorizing semantic phenomena. Finally, the traditional
findings of sociological theory, especially qualitative research, can be communicated
to people in other fields such as computer science by reconstructing them from a
computational perspective. Conversely, it can convince sociologists about the sig-
nificance of computer science findings.
Marr (1982) distinguished three levels of explanation for computation: (1) the
level of computational theory, (2) the level of representation/algorithms, and (3) the
level of implementation/hardware. We now consider the first level. The second level
of representation/algorithm corresponds to computational models, such as topic
models or word-embedding models, which were introduced later (cf. Arseniev-
Koehler & Foster, 2022). Hardware is a domain of neuroscience that is not
covered here.
The level of the theory of computation is the level that asks what is being done
and why. The nature of computation is intrinsically dependent on the problem that it
is solving. Thus, at this level, the question of what problem is being solved and why
is decisive. What kind of activity are we engaged in when trying to interpret the
meaning of words, things, and actions?
When formulating meaning-making as a computational problem, it struggles with
extracting information from noisy observed data. The challenge is that there is no
unique way to extract information from the observed data (Hohwy, 2013). For
example, the potential meaning of an observed physical action of another person
raising their hand does not have a one-to-one correspondence with the immediate
individual observed action (Hohwy, 2013). They might be saying hello, hailing a
cab, or stretching. There are multiple possible meanings, and any interpretation can
be erroneous (cf. Weber, 1946). A theory of meaning sees this meaning-making as a
computational problem and provides an explanation of how an actor selects one
potential meaning from multiple possibilities in an uncertain environment based on
observed data, how they correct this interpretation if it is erroneous, etc.
Why do we attempt to infer the meaning of things and events? We would like to
emphasize the viewpoint that we do this to predict things and events that will occur
and to choose appropriate actions. This perspective was emphasized by Weber
(1946) and Schutz (Schutz & Luckmann, 1973), among others, and is also presented
as a prevailing idea in cognitive science today (Anderson, 1991; Clark, 2015;
Hohwy, 2013).
By viewing meaning-making as the extraction of information from an uncertain
environment in order to predict the future and choose appropriate actions, we can
72 H. Takikawa and A. Ueshima

more clearly formulate not only the function of meaning but also the forms of
meaning (relations) and the learning process of meaning. In the following, we will
explain a) a function of meaning in terms of prediction, and then discuss b) forms of
meaning (relationships), and c) learning processes of meaning (semantic learning).

5.3.1 Function of Meaning: Prediction

Our desire to know the meaning of things lies in pragmatic motivation (Schutz &
Luckmann, 1973). In other words, we must know the meaning of things and events
in the world if we are to operate smoothly in this world. Knowing the meaning of a
thing or event is connected to predicting the things and events related to that thing or
event, which ultimately leads to actions based on predictions. For example, if we do
not know the meaning of a traffic sign, we may cause a serious traffic accident.
Suppose that we do not understand the meaning of a one-way traffic sign and enter a
street from the opposite direction. In that case, if our oncoming car is traveling
without slowing down, we might cause a serious accident. Understanding the
meaning of a sign implies being able to predict the occurrence of events associated
with that sign. When there is a one-way sign, we can predict that the car may drive in
the direction indicated by the arrow. With such predictions, we can avoid serious
accidents, drive smoothly, and arrive at our destination. Understanding the meaning
of things makes the world predictable and enables us to live smoothly.
What Schutz (Schutz & Luckmann, 1973) calls typification is the central activity
of meaning-making, which also aims at prediction. That is, by understanding objects
and objects in the world as specific categories, we can predict various aspects of the
attributes of these objects.
Typification is also called categorization and cognitive psychology (Anderson,
1991). According to Anderson (1991), categories are bundles of attributes. Thus, for
a thing with attribute I, if we estimate the category membership K from that attribute,
we can now predict another attribute, j, that the category has from the category
membership K. In the example used in Schutz’s typology, the appearance of a
mushroom (attribute i) and predicting that it has the attribute “edible” (attribute j)
is an example of the connection between categorization and prediction (Schutz &
Luckmann, 1973).

5.3.2 Relationality of Meaning

From the idea of meaning as a prediction, it is possible to draw another important


implication about meaning: Meaning takes a relational form. In sociology, this
corresponds to what Weber (1946) and Schutz (Schutz & Luckmann, 1973) call
Sinn Zusammenhang.
5 Model of Meaning 73

Meaning tied to prediction also implies that meaning relates a thing (observed
data) to another thing (other observed data) through prediction. From a computa-
tional perspective, understanding meaning involves inferring its potential meaning
from certain observed data, and the function of inferring meaning involves contrib-
uting to prediction. For example, if we observe a “rain cloud” and understand it to
mean “it will rain,” we can predict the event “the laundry will get wet.” From another
angle, the observed data of “rain clouds” can be seen as being associated with the
observed data of “laundry getting wet.” The meaning of “it rains” then “manifests”
as a complex of relationships among “rain clouds appear,” “laundry gets wet,”
“humidity rises,” and so on. Thus, meaning appears in the relationships between
multiple things. Using Anderson’s notation, the potential meaning K appears as a
complex of relationships among data i, j, . . . . If we link this to learning in anticipa-
tion of the next topic, it means that we learn the meaning K of observed data i by
relating it to other data j, l. . . . Let us discuss this next.

5.3.3 Semantic Learning

We stated that we estimate the potential meaning of things and objects to predict
what will happen next. We also mentioned that through such predictions, we can
connect the relationships among the attributes i, j, . . . of things and objects. This
prediction can be either correct or incorrect. If we see the color r on a mushroom and
think it is an edible mushroom K and eat it, the mushroom may not be edible j, but
may be poisonous l and give us a stomachache (Schutz & Luckmann, 1973). In this
case, we would conclude that the prediction of j based on the (mis)inferred category
K was incorrect, and we would reestimate the category membership inferred from
the mushroom’s color r as poisonous mushroom M rather than edible mushroom K.
Additionally, r would thereafter be remembered as being associated with poisonous
l, not edible j. Thus, prior estimates are modified and revised a posteriori when the
predictions are not accurate.
Here, we simply assumed that if the prediction matched the estimation result, the
estimation result would be retained, and if it missed, it would be replaced by another
estimation. In other words, the category “edible mushroom K” was estimated from
the color r of the mushrooms, and if the prediction derived from it (“mushrooms are
edible j”) was correct, the estimate K was retained, and if it was wrong (“mushrooms
are poisonous l”), K was discarded and M was assumed. Here, the learning process is
the choice between retaining or discarding the estimated result. In reality, however,
this learning process is stochastic because learning is performed to extract certain
information that is meaningful out of an uncertain and noisy observed environment.
In other words, the more accurate the prediction, the higher the probability that the
prior guess is correct and the lower the probability that it is wrong. This idea can be
formulated using Bayes principle:
Bayesian learning, which will be introduced again in the section on models of
computational linguistics, proceeds as follows: let Pr(K ) be the prior belief about
74 H. Takikawa and A. Ueshima

K (e.g., “a mushroom belongs to category K”), and let Pr( j| K ) be the probability of
an event j occurring when K occurs (“a mushroom belonging to K has attribute j”). If
the event j that occurs is exactly what K predicts (if Pr( j| K ) is large), the posterior
belief Pr(K| j) after observing j is strengthened, and if it differs from the prediction
(if Pr( j| K ) is small), the posterior belief is weakened. Thus, the meaning-making
process is characterized by Bayesian learning through trial and error.
By formulating meaning-making as a computational problem, we have discussed
the following: (a) the function of meaning is linked to prediction, (b) meaning
appears as a relational form, and (c) meaning is Bayesian, learned by trial and
error. With these considerations in mind, we will now examine how various com-
putational linguistics models can be used in theories of meaning.

5.4 Computational Linguistics Model

Computational linguistics interprets the understanding and creation of meaning in


natural language as a computational problem. A computational problem involves
extracting meaningful information from an uncertain environment and predicting the
behavior of the environment. In the case of natural language, the environment is
linguistic (Griffiths et al., 2007). From a computational perspective, a linguistic
environment can be formulated as an environment with statistical properties in
which individual words and phrases occur probabilistically. Based on this idea,
being able to understand the meaning of a text implies being able to predict the
words, phrases, etc., used in that text. For example, when reading or speaking a text,
we check our understanding by predicting what the author or speaker will write or
say. Furthermore, when an unexpected word or phrase appears, we realize that we
have misunderstood the meaning of the text or utterance.
In a more formalized formulation of the discussion so far, textual semantic
understanding is the process of inferring from the latent structure that produces the
observed features that appear in a text, thereby predicting the various features that
will be subsequently produced. This latent structure is named “gist” by Griffiths et al.
(2007). Thus, a model that models the latent structure that generates meaning is
referred to as a generative model.
The idea of the generative model and the aforementioned computational formu-
lation of categories proposed by Anderson are based on almost identical ideas. A
category defines a bundle of features of an object; a gist characterizes a bundle of
features of a text, such as observable words and coded phrases. Like categories, a gist
is not observable, but rather a latent structure g that generates an observable quantity.
Identifying this latent structure g is equivalent to the semantic understanding of the
text, which makes the occurrence of observable words and phrases predictable.
The relationality of meaning has been formulated in computational linguistics as
a distributional hypothesis of meaning. The distribution hypothesis was formulated
by anthropologist Firth (1957), who learned it from Malinofsky (at the same time,
linguist Harris (1954) published a similar idea). Thus, it is fair to say that this
5 Model of Meaning 75

hypothesis was originally derived from the social sciences. According to this
hypothesis, the meaning of a word can be inferred from the words around it (“You
shall know a word by the company it keeps!”). This hypothesis can be embodied by
the idea of meaning as a latent structure.
Now, let w be a set of words separated by a certain range, and let g be the latent
structure that generates the individual words w1, w2. . .that belong to w (cf. Griffiths
et al., 2007). Typically, w is a single sentence, and the individual words that make up
that sentence can be thought of as w1, w2. . . . In this case, as the distributional
hypothesis states, the meaning of a word is determined by its surrounding words. For
example, to know the meaning of word w1, we must estimate the latent structure g.
To estimate g, we must take w2, w3. . ., which are located around w1, as clues. Thus,
from the observed words alone, the meaning of word w1 can be determined by words
w2, w3. . . .
As such, the generative model can also be used for learning meaning. The
meaning of a word is learned from the meanings of the surrounding words. The
occurrence of the next word is predicted from the distribution of surrounding words
(via the estimation of the latent structure), and the accuracy of the prediction is
increased by modifying the prior guesses a posteriori according to the success or
failure of the prediction.
As previously mentioned, such a learning process can be formulated within the
framework of Bayesian learning. What we want to know is the probability Pr(g| w1)
that a word w1 is produced by latent structure g when we observe it. Using the
probability formula and transforming the equation, we obtain:

Prðw1 jgÞPrðgÞ
Prðgjw1 Þ = :
Prðw1 Þ

The right-hand side consists of three quantities, each of which has substantial
significance. These three quantities must be available to compute the left-hand
side. First, we need a subjective belief (prior belief), Pr(g), before observing w1.
Additionally, we must know the probability Pr(w1 | g) that g generates w1. This is
called likelihood in Bayesian learning. Finally, we need Pr(w1). This is the proba-
bility that w1 will occur, which is called the evidence.

5.4.1 Topic Model

As mentioned earlier, text generation can be viewed as a stochastic process from a


computational perspective. There are many possible models for text generation;
however, the topic model is the most widespread in applications in computational
social science and sociology. Topic models have been widely applied in various
areas of sociology, including cultural sociology, social movements, historical soci-
ology, and the history of sociology (DiMaggio et al., 2013; Fligstein et al., 2017;
76 H. Takikawa and A. Ueshima

Fig. 5.1 The structure of a topic model. Note: Topic as a probability distribution over words such
as z1, z2, . . .zm generates concrete words w1, w2, . . . A topic is assigned to each word according to
multinomial distribution (topic proportion) unique to each document

Nelson, 2020; Takikawa, 2019). Topic models are analytical tools that allow
researchers to discover latent topics in coherent texts and examine the words that
represent these latent topics. However, for the results of a topic model to have
external validity, it must be possible to say that real-world actors are actually
extracting similar topics from the text, at least implicitly, and producing meaning.
Therefore, in this chapter, we focus on the persuasiveness of the topic model as a
semantic model from a computational perspective.
The topic model itself has many variations, but the most basic model is the latent
Dirichlet allocation method (Blei et al., 2003). Almost all topic models, including
this latent Dirichlet allocation method, are hierarchical Bayesian. Hierarchical
Bayesian models represent the generation of meaning structurally. This is one of
the most prominent features of the topic models.
In the hierarchical Bayesian model, text is assumed to be generated by two
different probability distributions: a multinomial distribution that assigns a topic to
the slot in which the word is generated, and a multinomial distribution that proba-
bilistically generates a specific word in that slot, conditional on an assigned topic.
The former is called topic proportion, and the latter is called topic. We assume that
the observed word wi is generated by the latent structure g, expressed by these two
distributions. Let us examine this in more detail (Fig. 5.1).
First, we assume that each word w1, w2, . . ., wn that appears in document
d belongs to some topics z1, z2, . . ., zm. We assume that the word we observe is
generated probabilistically according to the topic to which it belongs. For example,
suppose we assume that a word like “gene” belongs to the topic “genetics” and a
word like “brain” belongs to the topic “neuroscience.” In the topic model, we model
the topic “genetics” as a probability distribution (multinomial distribution) over
words that generates words like “gene” and “DNA” with high probability (Blei,
2012). That is, each word is determined by the conditional probability distribution Pr
(w| zj).
How are the various topics assigned to the individual word slots in the first place?
This topic assignment follows a probability distribution called topic proportion,
which is unique to document d. To summarize, a probability distribution called
topic proportion is first assigned to each document. Next, a topic is assigned to a slot
5 Model of Meaning 77

according to the topic proportion. Finally, a word is generated according to the


distribution of the assigned topic.
Assuming such a generative process, how can we resolve the inverse problem—
that is, how can we estimate the topic of the observed word? The problem of
estimating the meaning of the word lies in whether the latent topic is interpreted as
the meaning of the word. Because the topic model has a hierarchical structure, the
process of estimating meaning using inverse computation is also somewhat complex.
However, this structure guarantees the flexibility of the semantic representation of
the topic model.
First, based on the distributional hypothesis of meaning, we assign a probable
topic to a word based on what other words appear in document d. It is important to
note that the meaning (topic) of a word is different if the surrounding words are
different, even if they are the same. For example, if words such as “game” and “win”
appear around the word “tie,” it is highly likely that the word means a draw. These
words can probably be considered to have been generated by the topic “sports.”
Conversely, if words such as “fashion” and “mode” co-occur, the word “tie” is likely
to mean a kind of costume. Word polysemy can be modeled naturally in the topic
model.
Second, a single document can have multiple topics that complicate the topic
estimation. If a single document is assigned a single topic (a model that assumes this
is called a single-membership model), the estimation is simple: choose the topic with
the highest likelihood of producing the set of words observed in the document
(single-membership topic models are sometimes used in applications to short texts
such as Twitter (Yin & Wang, 2014)). However, if we consider that a document is
composed of multiple topics, we need to consider mixed-membership models. For
example, if it is plausible to think that a scientific paper is composed of multiple
topics, such as “genetics” and “neuroscience, “then the generation of the text should
be modeled in a mixed-membership model. In this case, it is necessary to assume a
multinomial distribution θd, called topic proportion, regarding the proportion of any
topic appearing in the document and estimate its parameters. Accordingly, we
consider that potentially different topics z1, z2, . . . are assigned to the words w1,
w2, . . .of the document in question. Thus, in actual estimation, for each observed
word, the topic is estimated by considering together the likely topic it is produced by
and the plausible topic ratio for that document.
As previously mentioned, one of the advantages of topic models that rely on
complex hierarchical and structural representations is their ability to capture word
polysemy. Hierarchical models allow for the possibility of a word belonging to
multiple topics depending on its “context” (strictly speaking, it is better to say that
they allow for the possibility of a word having a high probability of occurring in
multiple topics, since topics are probability distributions). As we will see later, this is
a particularly important feature of the topic model in terms of its sociological
applications.
To estimate the parameters of the topic model, a Bayesian inference framework is
used. There are two main types of methods: sampling methods, such as Markov
chain Monte Carlo methods, and approximate methods, such as variational Bayesian
78 H. Takikawa and A. Ueshima

methods. For technical details, refer to Stayvers and Griffiths et al. (2007) and Blei
et al. (2003).
To what extent does the topic model capture human understanding and meaning-
making practice? Of course, it is unlikely that human language use undergoes a
generative process assumed by the topic model. However, the inverse process
framed in terms of Bayesian inference can be regarded as modeling, to some extent,
our understanding of the meaning of texts and the process of meaning-making. We
infer the meaning of a word in light of its surroundings and thus predict the next
word that will appear in a conversation or in a written text. This can be modeled as a
process of inverse (Bayesian) estimation of the topic that produces the word. Also, in
the process of comprehension, when we read a complex text consisting of multiple
topics, such as a scientific paper or a novel, we can infer backward what topic the text
covers and predict the subsequent development of the text accordingly. This also
corresponds to the fact that the topic model models the process of considering the
topic proportion of an entire document when estimating the meaning of a word.
The topic model is also compelling from a cultural sociological perspective.
DiMaggio et al. (2013) provided a coherent discussion on this. They pointed out
the closeness of the topic model to cultural sociology in three or four ways.
The first is the relationality of meaning (DiMaggio et al., 2013). Central to
cultural sociology is the idea that “meanings do not inhere in symbols (words,
icons, gestures) but that symbols derive their meaning from the other symbols
with which they appear and interact (DiMaggio et al., 2013, pp. 586–587; Mohr,
1994; Mohr & Duquenne, 1997; Saussure, 1983). As mentioned earlier, the distri-
butional hypothesis of meaning in computational linguistics has its origins in this
cultural sociological idea and can therefore be regarded as the basis for the modeling
of this idea. Specifically, the topic model embodies the idea of relationality of
meaning in the way the topic of a word is determined. In this model, the topic of a
word is assigned by its co-occurrence with other words, which describes the process
by which meaning is relationally determined.
Cultural sociology also emphasizes that meaning is determined by “context.”
Thus, the same word may have different meanings in different contexts. This can be
called contextual polysemy, derived from the relationality of meaning (DiMaggio
et al., 2013). The topic model models this polysemy such that topic assignment
changes depending on differences in collocations, even for the same word. As
already mentioned, this modeling of polysemy is a strength of topic models that
take a structural representation.
DiMaggio et al. (2013) applied the topic model to a corpus of newspaper articles
on arts funding by the U.S. government. They tested whether the topic model
adequately captured the relationship between meaning and polysemy. In terms of
polysemy, they examined whether the word “museum,” for example, actually
expresses different meanings depending on the different topics assigned by the
model. The results show that the word “museum,” assigned to different topics, can
indeed be interpreted as meaning different things.
The second way regards heteroglossia (DiMaggio et al., 2013). This was origi-
nally Bakhtin’s idea that a text should consist of multiple voices. By voices, Bakhtin
5 Model of Meaning 79

refers to “characteristic modes of verbal expression (word choice, syntax, phrasing,


and so on) associated with particular speech communities.” (Bakhtin, 1982
[1934–1941], cited in DiMaggio et al., 2013) As we have seen, the topic model
recognizes the mixed membership of documents. This feature can be used to model
the manner in which a text is composed of various voices. In other words, each topic
is viewed as representing a different voice. DiMaggio et al. (2013) clarified whether
a topic represents a different voice by examining the language choices, emotional
tone, and argument justification methods for each topic.
Third, there is an important concept in cultural sociology: a frame. “A frame is a
set of discursive cues (words, images, narrative) that suggests a particular interpre-
tation of a person, event, organization, practice, condition, or situation” (DiMaggio
et al., 2013, p. 593). In cultural sociology, frames are linked to people’s cognitive
styles and prime specific schemas or activate specific association networks. If we
consider frames as a kind of latent structure that produces words, then the topic of the
topic model can be considered an operationalization of the concept of frames used in
cultural sociology (cf. Fligstein et al., 2017).
The above discussion shows that the topic model is not only a methodology, but
that it can also be combined with a model of cultural sociology, human cognition,
and meaning-making mechanisms that should lie behind it.

5.4.2 Word-Embedding Model

Along with topic models, another model commonly used in sociological research is
the word-embedding model (Arseniev-Koehler & Foster, 2022; Jones et al., 2019;
Kozlowski et al., 2019). A word-embedding model provides an efficient represen-
tation of the meaning of a word using vectors, which is also referred to as the
distributed representation of words. The most important feature of this model is that
meanings can be arranged spatially by representing the meanings of words by
vectors. In semantic space, the more the meanings are represented by vectors that
are close to each other, the greater the “similarity” between them. In addition, in
semantic space, it is possible to perform calculations called analogy calculations. For
example, the following calculation is possible.

king - man þ woman = queen:

Underlying this calculation is the principle that a directional vector such as woman-
man corresponds to a certain semantic dimension (in this case, the gender dimen-
sion), and that by moving the king in that direction in space, meaning can be shifted
along that semantic dimension (see Fig. 5.2).
The structure of the semantic space in which this analogical calculation is
possible has great sociological applicability. For example, this feature can be used
to construct axes on the semantic space that represent specific social dimensions of
meaning, such as the gender dimension or the social class dimension. These
80 H. Takikawa and A. Ueshima

Fig. 5.2 An example of analogy calculation

constructed axes can be used to examine what gendered or class connotations a


particular cultural or social phenomenon, such as a sport or occupation, is imbued
with (Kozlowski et al., 2019). Alternatively, we can look at the moral or cultural
connotations attached to obesity (Arseniev-Koehler & Foster, 2022).
Again, we examine the persuasiveness of word-embedding models not as mere
analytical tools but in terms of modeling human meaning-making (cf. Arseniev-
Koeler & Foster, 2022; Günther et al., 2019; Lake & Murphy, 2023). Word-
embedding models can be constructed in several different ways, but we will focus
on predictive or artificial neural network models. A typical model of an artificial
neural network is the word2vec model (Mikolov et al., 2013). There are two different
types of word2vec models: skip-gram models and CBOW models; in this chapter,
we introduce the CBOW model.
In artificial neural network models, the parameters of the model are learned by
repeatedly solving a task by trial and error. The CBOW model uses a neural network
with a two-layer structure: a hidden layer with N nodes and an output layer with
V nodes (see Fig. 5.3). The input V-dimensional one-hot vector (a vector consisting
of the indices of V words appearing in the corpus, where the index part of the word is
1 and the others are 0) is propagated to the hidden layer through a V × N weight
matrix W and then to the output layer through an N × V weight matrix W′. The V-
dimensional vector is then transformed by the softmax function, and the word with
the highest probability is selected for the output. This information about the success
or failure of the prediction is propagated through the model by back propagation, and
the parameters are adjusted.
How would the vector of words or the discrete representation be obtained using
this model? It corresponds to a row vector with N elements of a weighting matrix that
weighs from the input layer to the hidden layer. In general, the number of words
V appearing in a corpus is tens of thousands or more, whereas N is only a hundred to
a few hundred, so the dimension of the vector is highly reduced, which means that
the embedded vector efficiently stores semantic information.
To what extent can the word-embedding model be used for meaning-making? Let
us consider its persuasiveness as a model in terms of the predictive function of
meaning, relationality of meaning, and semantic learning.
The neural network model of word embedding is also based on the predictive
function of the meaning. Specifically, the meaning of a word is learned by predicting
the target word from the context word of the window size. In other words, distributed
5 Model of Meaning 81

Fig. 5.3 Architecture of artificial neural network in CBOW model

representations of context words are learned in terms of the representations that best
predict the target. Thus, the meanings stored in the word-embedding vector are
organized under the function of predicting words.
Furthermore, the word-embedding model is based on the distributional hypoth-
esis of meanings, evident from the structure of the CBOW model, in which context
words are linked to target words through prediction. The idea that meaning is
acquired through the task of predicting co-occurring target words is consistent
with the idea of the distributional hypothesis that meaning is determined by sur-
rounding words. Thus, this model also captures the relationality of meanings.
Finally, in terms of learning, the word2vec CBOW model did not use a Bayesian
inference framework. However, it models the learning process of acquiring the
semantic content of a word by trial and error through the success or failure of
predictions.

5.4.3 Topic Models and Word-Embedding Models

In terms of the predictive function of meaning, the relationality of meaning, semantic


learning, topic models, and word-embedding models broadly share these character-
istics. However, there are significant differences between topic models and word-
embedding models in terms of meaning-making models. The following two points
can be pointed out. First, the topic model is based on hierarchical and structural
82 H. Takikawa and A. Ueshima

representations, whereas the word2vec model is based on spatial representations.


Second, the topic model is a generative model, whereas word2vec is not directly a
generative model.
The topic model is a structural representation model, which means that the model
is hierarchical, as we have seen earlier. In other words, the topic model is a two-stage
process in which topic proportions are determined, topics are assigned based on such
proportions, and words are generated probabilistically. Such a structural representa-
tion allows for flexible representation of the meaning of words. For example, topic
models can represent the polysemy of words. The same word may have different
meanings for different topics. This is not possible in a normal embedding model.
Another strength of structural representation is its ability to classify words as
concepts into qualitatively different groups, such as frames in sociology.
On the contrary, the spatial representations made possible by word-embedding
models also have aspects that are well in line with the theoretical tradition of
sociology. One such tradition is Bourdieu’s model of social spaces. As mentioned
earlier, his theory represents both social and symbolic structures in a spatial model,
in which the structure of social space corresponds to the structure of people’s
cognition (Bourdieu, 1989). The advantage of spatial representation is that different
social and cultural meanings can be mapped onto the same space, allowing the study
of the location of these meanings. This seems particularly suited to thinking and
cognitive practices, through which we compare a set of concepts along a dimension
and examine their location along that dimension. Specifically, the assignment of
gender stereotypes to certain occupations, for example, or class images attached to
cultural practices such as going to the theater, sports, or museums, are often captured
by such spatial representations. Kozlowski et al. (2019) examplify such an
approarch. In line with Bourdieu’s theoretical position, such spatial representations
can be seen as reflecting the structure of our cognitive practices. Apart from
Bourdieu, the view that humans use spatial metaphors to construct meaning is well
established in conceptual metaphor theory in cognitive linguistics (Lakoff & John-
son, 1999).
However, the extent to which cognitive practices can be captured in unstructured
homogeneous spaces remains debatable.
The first problem is that naive spatial models cannot distinguish between the
various aspects of semantic “similarity.” For example, the similarity of meaning as
measured by word2vec cannot distinguish between the synonymy and antonymy of
words (Arseniev-Koehler & Foster, 2022). This limitation stems from the structural
inflexibility of the spatial arrangement and representation of word meanings (Chen
et al., 2017; Griffiths et al., 2007).
Second, the naive spatial model cannot capture the polysemy of the words.
Because the meaning of a word is represented by only one position in space, it
cannot capture the process by which the meaning of a word changes according to
context. This is a major difference from the structural representation of the topic
model, which is better suited for representing word polysemy.
Third, there is a major difference in that the topic model is a generative model,
whereas word2vec’s embedding model is not a generative model. In terms of
5 Model of Meaning 83

computational theory of meaning, a generative model that models the process of


meaning generation is preferable. Therefore, the word2vec embedding model has
significant limitations.
However, several interesting attempts have been made to transform the embed-
ding model into a generative model. One of these is Arora’s model (Arora et al.,
2016, 2018), which has been applied to sociology by DATM (Arseniev-Koehler
et al., 2022).
Arora proposed a generative model that generates embedding vectors to answer
the question of why embedding vectors obtained by word2vec can be added together
to form a certain semantic dimension, as in analogical computation (Arora et al.,
2016, 2018). The generative model was straightforward. Consider a discourse vector
ct on space Rd. The spatial coordinates represent the semantic content being spoken
at that moment. ct is the “gist” of the word behind the observed word. Each word also
has a latent vector, vw. Now, ct randomly walks on Rd, and the t-th word of a group of
words is produced at step t. Specifically, the word produced at time t is determined
by the closeness between ct and vw. In other words, w is produced with the following
probability of being observed:

Pr½the word w is produced at time t jct  / expð < ct , vw > Þ

Such a generative model provides a basis for calculating the probability of occur-
rence of a word using the CBOW model. As we saw earlier, the CBOW model
calculates the probability of occurrence of a word c by its closeness to the average of
the k context words w1. . .wk. Such averaging naturally follows from the Bayesian
inference process of estimating the current c from the words produced by the random
walk of c over k periods. When interpreted as a model of semantic cognition, the
agent considers the process of estimating the potential meaning (discourse vector) of
a word from the distribution of surrounding words, assuming that words appearing in
the surroundings are likely to share a potential meaning with the target word.
There are other advantages of Arora’s model over word2vec’s CBOW model, in
which the observed word can be viewed as a compound construct consisting of
multiple discourse vectors or latent meanings analogous to the topic model. This
allows the model to address the problem of word polysemy, which is not possible
with the conventional embedding model. It also opens up sociological applicability.
Arseniev-Koehler et al. (2022) proposed DATM based on this model and applied it
to sociological text analysis.
Generative modeling of embedding models can be seen as the integration of topic
and word-embedding models (cf. Arseniev-Koehler et al., 2022), which is a prom-
ising approach. Generative models provide a model of semantic cognition, that is,
how humans perceive meaning and generate words. On the contrary, semantic space
models, although more limited than structured representations, have advantages over
topic models, such as their ability to extract semantic dimensions. Furthermore, the
weaknesses of the conventional embedding model, such as the handling of word
polysemy, can be overcome to some extent by generative modeling of the
84 H. Takikawa and A. Ueshima

embedding model. However, whether the semantic space model has reached the
same level of expressiveness as the structured representation model is an open
question.
Arora’s generative model assumes a very simple process of inferring the meaning
of a target word from surrounding words based on the assumption of a random walk
of the potential meanings of words. Further integration of the topic model and
embedding model should be pursued by building a generative model that closely
approximates the human meaning-making mechanism.

5.5 Language Model and Explanation of Actions

The models presented thus far have focused primarily on how people perceive the
meaning of objects and other people (texts about them). However, as Weber (1946)
points out, the interpretation of meaning in sociology is ultimately performed to
explain human action.
While some of our cultural and semantic constructions and their interpretations
are directly related to and explain actions, others do not. People’s retrospective and
ex-post “explanations” of their actions are at least not directly linked to their actions,
but rather justify them ex-post (Swidler, 1986; Vaisey, 2009). In contrast, values and
mental images motivate people to act. From the point of view of analyzing meaning-
making mechanisms to explain social phenomena as an accumulation of actions, it is
the latter kind of motivational meaning-making that we want to extract.
Therefore, the question of whether the semantic structure and semantic space
obtained by the topic model and embedding model are actually related to people’s
actions, and if so, how they are related, must exist at the final stage of elucidating the
meaning-making mechanism. In other words, it is necessary to examine the perfor-
mance of the meaning-making model by focusing on the extent to which the
meaning of an object identified by the meaning-making model can explain subse-
quent human behavior. If the meaning of the object identified by the model can
explain human behavior to some extent, it can be said that the model’s estimation of
“meaning” has some validity.
Here, we focused on a specific class of meaning-making models, the word2vec
model. We examined to what extent the meanings that the word2vec model specifies
can explain the succeeding behaviors. In other words, we examined whether and to
what extent it is possible to predict people’s behavior using the word vectors
obtained from word2vec as explanatory variables.
Here, we refer to the method in which word vectors are used as explanatory
variables in regression analysis to predict human judgments (Bhatia, 2019; Hollis
et al., 2017; Richie et al., 2019). Consider that we would like to use words as
explanatory variables and predict people’s responses to the words (a specific exam-
ple follows later). In this case, we can use word2vec to represent word meanings
using 300-dimensional vectors. This means that each explanatory variable (i.e., a
word) has 300 semantic dimensions, such as when we used 300 survey questions
5 Model of Meaning 85

regarding the word and quantified the word’s meaning with 300 numeric values.
Because each explanatory variable is now represented in 300 numeric values, a
regression model in this method has 300 corresponding coefficients that determine
the weights for each of the 300 semantic dimensions. Owing to the many explana-
tory variables, regularization methods such as ridge regression or model comparison
based on AIC are often used to prevent overfitting. This method shows that judg-
ments such as risk perception, gender roles, and health behaviors can be predicted
with high accuracy (Richie et al., 2019).
In a more recent study, Ueshima and Takikawa (2021) used word vectors to
predict people’s vaccine allocation judgments regarding COVID-19. Participants
rated how much priority vaccination should be given to each of the >130 occupa-
tions. The authors used a pre-trained Japanese word2vec resource (Manabe et al.,
2019) to obtain a 300-dimensional word vector for each occupation. They reported
that regression analysis using word vectors as explanatory variables exhibited a high
out-of-sample predictive accuracy for participants’ vaccination priority judgments.
To demonstrate the effectiveness of this approach, they compared the word-vector
regression model with a benchmark regression model. The benchmark regression
contained relevant explanatory variables such as the social importance of each
occupation to quantify the occupations. The results of the model comparison showed
that the word-vector model predicted vaccination priority judgments better than the
benchmark model did. It is notable that the explanatory variables of the benchmark
model—social importance, personal importance, and familiarity with each
occupation—were obtained from each participant, while the word vector used in
this study was not measured for this specific study, demonstrating the usefulness of
the word-vector approach. Overall, the results of this study suggest that word vectors
can quantify the meanings that people have for each occupation.
In regression using word vectors, prediction is made by learning the regression
coefficients or weights for each dimension of the word vector from the data.
Intuitively, this can be interpreted as modeling how much each of the hundreds of
semantic dimensions is weighted when people judge specific domains such as
vaccination priority, gender role, or risk perception.
Using weights (regression coefficients) for each semantic dimension makes it
possible to interpret the criteria used to make judgments (Bhatia, 2019). In Ueshima
and Takikawa (2021), participants answered that they would prioritize vaccination
for occupations such as nurses. Accordingly, the 300-dimensional weights of a
regression model trained by the participants’ vaccination judgments had large dot
products with the word vector of “nurse.” This is because larger dot products
indicate a higher prioritization of vaccination in this regression model. Importantly,
it is possible to calculate the dot products between the regression weights and word
vectors other than occupations. By exploring words that have larger or smaller dot
products with the obtained 300-dimensional weights, it is possible to interpret the
criteria associated with increasing or decreasing people’s judgments of vaccination
priority. In the case of the vaccination priority judgments, words associated with
medical institutions such as a “hospital” and with public service such as the “local
government” produced larger dot products with the weights compared to other
86 H. Takikawa and A. Ueshima

common words, suggesting that meanings of these words were related to criteria for
rating vaccination priority higher. Such exploratory analyses of judgment criteria
help to understand the psychological mechanisms underlying judgments.
Moreover, it was possible to predict vaccination priority ratings for occupations
that were not included in the study. For example, based on these exploratory
analyses, we can infer that an occupation such as a city hall worker would be rated
highly. Thus, interpreting the obtained weights can lead to rigorous confirmatory
research on new occupations. In summary, using word vectors as predictors or
explanatory variables in multiple regression analysis is a promising method for
predicting and interpreting human behavior.
The fact that the word vectors obtained from word2vec are helpful in predicting
human behavior indicates that it captures not only the semantic relationships
between words in the linguistic corpus but also, to some extent, human knowledge
about the world (Caliskan et al., 2017; Günther et al., 2019). To further develop
models of meaning for predicting human behavior, future research should consider
that the meanings of words are constructed not only by linguistic information but
also by the perceptual and motor systems of humans (Bruni et al., 2014; Glenberg &
Robertson, 2000; Lake & Murphy, 2023). Using multimodal data is a promising
direction for developing better models of meaning to predict human behavior.
Another important direction for future research is to model the heterogeneity of
meanings and behaviors among people. At present, corpora used to obtain word
vectors often consist of linguistic resources generated by many people and not by a
certain individual. Therefore, the vector representation of words to be learned
represents the average meaning or knowledge representation of words for the people
who generate the corpora. However, the meanings of words should differ among
individuals depending on the nature of the words (Wang & Bi, 2021). For example,
small children may associate an occupation, such as doctors, with fear, while older
people do not. Such heterogeneity of word meanings affects individuals’ behavior
differently. Thus, obtaining word vectors that capture the heterogeneity of meanings
is a necessary step toward modeling individual behaviors with higher accuracy.

5.6 Conclusion

In this chapter, we discuss the possibility of applying and extending models devel-
oped in computational linguistics to construct a sociological model of meaning-
making. The starting point for constructing a sociological theory of meaning-making
is to view it as a computational problem, that is, to capture valid information from the
environment with uncertainty, predicting next events and thus achieving one’s own
goals. From this formulation, we can draw three points of meaning production: the
predictive function of meaning, relationality of meaning, and Bayesian learning.
Both topic models and word-embedding models can be interpreted as theoretical
models of meaning-making, but there are differences between them. The topic model
is a hierarchical generative model of language that is highly compatible with cultural
5 Model of Meaning 87

sociology and particularly suited for capturing word polysemy. In contrast, word-
embedding models allow for spatial representation and are suitable for capturing the
cultural dimensions of events and practices. An integrated model of the topic and
word-embedding models is required. The last element that completes the theory of
sociological meaning-making is the link between interpretation and action. In this
section, we introduce a regularized regression model that examines the relationship
between semantic representation and action. Future directions include the incorpo-
ration of not only linguistic information but also nonlinguistic information, infor-
mation about the physical environment and the body, and heterogeneity of meaning
interpretation according to the attributes of the actor and the socialization process,
enabling a more precise prediction of subsequent actions.

References

Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review,


98(3), 409.
Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2016). A latent variable model approach to
pmi-based word embeddings. Transactions of the Association for Computational Linguistics, 4,
385–399.
Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2018). Linear algebraic structure of word
senses, with applications to polysemy.Transactions of the Association for. Computational
Linguistics, 6, 483–495.
Arseniev-Koehler, A., & Foster, J. G. (2022). Machine learning as a model for cultural learning:
Teaching an algorithm what it means to be fat. Sociological Methods & Research, 51(4),
1484–1539.
Arseniev-Koehler, A., Cochran, S. D., Mays, V. M., Chang, K. W., & Foster, J. G. (2022).
Integrating topic modeling and word embedding to characterize violent deaths. Proceedings
of the National Academy of Sciences, 119(10), e2108801119.
Bakhtin, M. M. (1982). (1934–1941). (M. Holquist, Trans.) In: C. Emerson & M. Holquist (Eds.),
The dialogic imagination: Four essays. University of Texas Press, .
Becker, G. S. (1976). The economic approach to human behavior. University of Chicago Press.
Berger, P. L., & Luckmann, T. (1967). The social construction of reality: A treatise in the sociology
of knowledge. Anchor Books.
Bhatia, S. (2019). Predicting risk perception: New insights from data science. Management Science,
65(8), 3800–3823. https://doi.org/10.1287/mnsc.2018.3121
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine
Learning Research, 3, 993–1022.
Bourdieu, P. (1984). Distinction: A social critique of the judgement of taste. Harvard University
Press.
Bourdieu, P. (1989). The state nobility: Elite schools in the field of power. Stanford University
Press.
Boutyline, A., & Soter, L. K. (2021). Cultural schemas: What they are, how to find them, and what
to do once you’ve caught one. American Sociological Review, 86(4), 728–758.
Bruni, E., Tran, K. N., & Baroni, M. (2014). Multimodal distributional semantics. Journal of
Artificial Intelligence Research, 49, 1–47. https://doi.org/10.1613/jair.4135
88 H. Takikawa and A. Ueshima

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from
language corpora contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.
1126/science.aal4230
Chen, D., Peterson, J. C., & Griffiths, T. L. (2017). Evaluating vector-space models of analogy.
ArXiv, 1705(04416), 1–6.
Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford Univer-
sity Press.
Coleman, J. S. (1990). Foundations of social theory. Harvard university press.
Dilthey, W. (1910). Der Aufbau der geschichtlichen Welt in den Geisteswissenschaften. Verlag der
Königlichen Akademie der Wissenschaften, in Commission bei Georg Reimer.
DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23, 263–287.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of US government arts
funding. Poetics, 41(6), 570–606.
Durkheim, E. (1915). The elementary forms of the religious life: A study in religious sociology.
Macmillan.
Evans, J. A., & Aceves, P. (2016). Machine translation: Mining text for social theory. Annual
Review of Sociology, 42, 21–50.
Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. In Studies in linguistic analysis
(pp. 1–32). Basil Blackwell.
Fligstein, N., Stuart Brundage, J., & Schultz, M. (2017). Seeing like the Fed: Culture, cognition, and
framing in the failure to anticipate the financial crisis of 2008. American Sociological Review,
82(5), 879–909.
Foster, J. G. (2018). Culture and computation: Steps to a probably approximately correct theory of
culture. Poetics, 68, 144–154.
Garfinkel, H. (1967). Studies in ethnomethodology. Polity Press.
Glenberg, A. M., & Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of
high-dimensional and embodied theories of meaning. Journal of Memory and Language, 43(3),
379–401. https://doi.org/10.1006/jmla.2000.2714
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation.
Psychological Review, 114(2), 211.
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content
analysis methods for political texts. Political Analysis, 21(3), 267–297.
Günther, F., Rinaldi, L., & Marelli, M. (2019). Vector-space models of semantic representation
from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psy-
chological Science, 14(6), 1006–1033. https://doi.org/10.1177/1745691619861372
Harris, Z. (1954). Distributional hypothesis. Word. World, 10(23), 146–162.
Hohwy, J. (2013). The predictive mind. Oxford University Press.
Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram
vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8),
1603–1619. https://doi.org/10.1080/17470218.2016.1195417
Jones, J. J., Amin, M. R., Kim, J., & Skiena, S. (2019). Stereotypical gender associations in
language have decreased over time. Sociological Science, 7, 1–35.
Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the
meanings of class through word embeddings. American Sociological Review, 84(5), 905–949.
Lake, B. M., & Murphy, G. L. (2023). Word meaning in minds and machines. Psychological
Review, 130(2), 401–431. https://doi.org/10.1037/rev0000297
Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to
Western thought. Basic Books.
Lamont, M. (2000). Meaning-making in cultural sociology: Broadening our agenda. Contemporary
Sociology, 29(4), 602–607.
Lizardo, O. (2004). The cognitive origins of bourdieu’s habitus. Journal for the Theory of Social
Behaviour, 34(4), 375–401.
5 Model of Meaning 89

Lizardo, O. (2019). Pierre bourdieu as cognitive sociologist. In W. Brekhus & G. Ignatow (Eds.),
The Oxford Handbook of Cognitive Sociology. Oxford University Press.
Luhmann, N. (1995). Social systems. Stanford University Press.
Manabe, H., Oka, T., Umikawa, Y., Takaoka, K., Uchida, Y., & Asahara, M. (2019). Japanese word
embedding based on multi-granular tokenization results (in Japanese). In Proceedings of the
twenty-fifth annual meeting of the Association for Natural Language Processing.
Marr, D. (1982). Vision: A computational investigation into the human representation and
processing of visual information. MIT Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations
of words and phrases and their compositionality. Advances in Neural Information Processing
Systems, 26, 3111–3119.
Mohr, J. (1994). Soldiers, mothers, tramps and others: discourse roles in the 1907 New York
Charity Directory. Poetics, 22, 327–358.
Mohr, J. W. (1998). Measuring meaning structures. Annual Review of Sociology, 24(1), 345–370.
Mohr, J. W., & Duquenne, V. (1997). The duality of culture and structure: poverty relief in
New York City, 1888–1917. Theory and Society, 26, 305–356.
Mohr, J. W., Bail, C. A., Frye, M., Lena, J. C., Lizardo, O., McDonnell, T. E., Mische, A., Tavory,
I., & Wherry, F. F. (2020). Measuring culture. Columbia University Press.
Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological
Methods & Research, 49(1), 3–42.
Opp, K. D. (1999). Contending conceptions of the theory of rational action. Journal of Theoretical
Politics, 11(2), 171–202.
Richie, R., Zou, W., & Bhatia, S. (2019). Predicting high-level human judgment across diverse
behavioral domains. Collabra: Psychology, 5(1), 50. https://doi.org/10.1525/collabra.282
Saussure, F. (1983). Course in general linguistics. Open Court Press.
Schutz, A., & Luckmann, T. (1973). The structures of the life-world (Vol. 1). Northwestern
University Press.
Simmel, G. (1922). Die Probleme der Geschichtsphilosophie: eine erkenntnistheoretische Studie.
Duncker & Humblot.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic
analysis (pp. 439–460). Psychology Press.
Swidler, A. (1986). Culture in action: Symbols and strategies. American Sociological Review, 51,
273–286.
Takikawa, H. (2019). Topic dynamics of post-war Japanese sociology: Topic analysis on Japanese
Sociological Review corpus by structural topic model (Japanese). Sociological Theory and
Methods, 34(2), 238–261.
Ueshima, A., & Takikawa, H. (2021). December. Analyzing vaccination priority judgments for
132 occupations using word vector models. In IEEE/WIC/ACM International Conference on
Web Intelligence and Intelligent Agent Technology (pp. 76–82).
Vaisey, S. (2009). Motivation and justification: A dual-process model of culture in action. American
journal of sociology, 114(6), 1675–1715.
Wang, X., & Bi, Y. (2021). Idiosyncratic tower of babel: Individual differences in word-meaning
representation increase as word abstractness increases. Psychological Science, 32(10),
1617–1635. https://doi.org/10.1177/09567976211003877
Watts, D. J. (2014). Common sense and sociological explanations. American Journal of Sociology,
120(2), 313–351.
Weber, M. (1946). From Max Weber: Essays in sociology. Facsimile Publisher.
White, H. C. (1963). An anatomy of kinship: Mathematical models for structures of cumulated
roles. Prentice-Hall.
White, H. C. (1992). Identity and control: A structural theory of social action. Princeton University
Press.
White, H. C. (1995). Network switchings and Bayesian forks: Reconstructing the social and
behavioral sciences. Social Research, 64, 1035–1063.
90 H. Takikawa and A. Ueshima

White, H. C. (2008a). Identity and control: How social formations emerge (2nd ed.). Princeton
University Press.
White, H. C. (2008b). Notes on the constituents of social structure. Soc. Rel. 10-Spring’65.
Sociologica, 2(1).
White, H. C., Boorman, S. A., & Breiger, R. L. (1976). Social structure from multiple
networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81(4),
730–780.
Yin, J., & Wang, J. (2014). A Dirichlet multinomial mixture model-based approach for short text
clustering. In Proceedings of the 20th ACM SIGKDD international conference on knowledge
discovery and data mining (pp. 233–242).
Chapter 6
Sociological Meaning of Contagion

Yoshimichi Sato

6.1 Contagion as a Main Theme in Sociology

Contagion or diffusion has been a main theme in sociology.1 Gabriel Tarde, a


founding father of the study of diffusion, proposed the laws of imitation and made
diffusion as a critical concept in the study of society, that is, sociology (Tarde, 1890).
Following the tradition of the study of diffusion laid out by Tarde, Everett Rogers
studied classic examples of diffusion in his seminal work (Rogers, 2003). This book,
whose first edition was published in 1962, deals with various topics of diffusion that
succeeded or failed from a failed case of a practice of water boiling in a Peruvian
village to the diffusion of hybrid corn in Iowa to the STOP AIDS program in San
Francisco.
Computational social science has rapidly and radically advanced the study of
contagion. This is partly because computational social science finds it easy to trace
the contagion process by utilizing two characteristics of big data, that is, “Big” and
“Always-on” (Salganik, 2018). “Big” literally means that the size of the data is large,
and “Always-on” means that data is continually collected. Data used by conven-
tional methods in the study of contagion misses these characteristics.
Wu et al. (2020), for example, apply big data methods to nowcast and forecast of
COVID-19. Their article has three purposes. The first one is to infer the basic
reproductive number of COVID-19, R0, and the outbreak size in Wuhan, China,
from December 1, 2019, to January 25, 2020. The second one is to estimate the
number of cases exported from Wunan to other cities in mainland China. The third
one is to forecast the spread of COVID-19 within and outside mainland China. For

1
Contagion and diffusion are used interchangeably in this chapter.

Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 91
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_6
92 Y. Sato

these purposes, they build a susceptible-exposed-infectious-recovered (SEIR) model


to simulate the pandemic of COVID-19 in Wuhan. For the estimation of the model,
they need mobility data of people moving from and into Wuhan domestically as well
as internationally. They collect data from three sources: “(1) the monthly number of
global flight bookings to Wuhan for January and February, 2019, obtained from the
Official Aviation Guide (OAG); (2) the daily number of domestic passengers by
means of transportation recorded by the location-based services of the Tencent
(Shenzhen, China) database from Wuhan to more than 300 prefecture-level cities
in mainland China from January 6 to March 7, 2019; and (3) the domestic passenger
volumes from and to Wuhan during Chunyun 2020 (Spring Festival travel
season. . .) estimated by Wuhan Municipal Transportation Management Bureau
and press-released in December, 2019” (Wu et al., 2020, pp. 690–691).
Their research is an excellent example of big data analysis applied to the study of
pandemic, the diffusion of the corona virus. And a huge number of articles on the
diffusion of the corona virus using big data have been published since the outbreak.
However, diffusion of new ideas, values, and norms in society studied by Rogers is
different from that of a virus such as the corona virus. I will explore the difference in
the next section.

6.2 Complex Contagion

Diffusions of new cultural items such as ideas, values, and norms are different from
that of viruses. Centola and Macy (2007) call diffusions of new cultural items
“complex contagions” and argue that an individual needs to contact more than one
source of activation for a complex contagion to occur. In contrast, a simple conta-
gion, a contagion of a virus, for example, occurs if an individual contacts a person
having a new item such as a virus. This type of contagion is based on a biological
mechanism, and the individual does not need to interpret the meaning of the virus to
be infected by it. In contrast, a diffusion of a cultural item is based on a sociological
mechanism, and the individual interprets its meaning before he or she decides
whether to accept it or not.
Take a social movement, for example. Participation in a social movement, say, in
a demonstration incurs risks such as being arrested by police and being beaten by
opponents of the movement. Thus, exposure to a single source of activation is not
enough for an individual to participate in the risky movement. Rather, he or she
needs more than one source of activation to participate in it, as Centola and Macy
(2007) argue. Why does he or she need more than one source of activation? Centola
and Macy (2007) propose four reasons: strategic complementarity, credibility,
legitimacy, and emotional contagion. I will examine them in detail in the next
section. What is important about them is that, in a sense, Centola and Macy (2007)
go back to Rogers’ (2003) study of diffusion because he also recognizes the
importance of sociological and psychological factors for a diffusion to occur.
Centola and Macy (2007) conduct computer simulations to support their argu-
ment. They start with the small world model proposed by Watts and Strogatz (1998).
6 Sociological Meaning of Contagion 93

Watts and Strogatz (1998) assume a ring lattice in which an individual is connected
to four nearest neighbors. Then, they randomly cut ties and connect them to
randomly chosen individuals. This rewiring connects distant individuals and creates
a small world. The originality of their study is that they demonstrate that a simple
procedure of cutting and rewiring ties creates a small world, which became famous
because of Milgram’s small-world experiment (Milgram, 1967). He conducted
innovative social-psychological experiments. He and his collaborators randomly
chose a group of people in Wichita, Kansas, and in Omaha, Nebraska, and asked
people who agreed to participate in the experiments, who were called a starting
person, to send a folder via their acquaintances to a person living or working near
Harvard University, who was called a target person. A result of the experiments
shows that the median number of intermediate acquaintances for the folder to be
delivered from a starting person to the target person is five. The result empirically
and scientifically endorses the cliché used in daily life, “What a small world.” A
major contribution of Watts and Strogatz (1998) is that they succeeded in creating a
small world by a simple procedure of randomly cutting and rewiring ties.
Centola and Macy (2007) modify the model of Watts and Strogatz (1998) by
changing only one assumption. They make activation thresholds for contagion
higher than those assumed in previous studies. This seemingly minor change in
the assumption leads to the importance of wide bridges for complex contagion to
propagate. Simply speaking, the width of a bridge between two persons is the
number of bridges closely connecting them. [See Centola and Macy (2007,
pp. 713–714) for a mathematically strict definition of the width of a bridge.]
Centola (2018) advanced the study by Centola and Macy (2007) by conducting an
online experiment. As he points out, it is almost impossible to collect the whole
network data of a society even though its size is small in order to check the empirical
validity of the theory of complex contagion. To solve this problem, Centola (2018)
created a society online, which has two different social networks: a clustered
network and a random network. The society he created was an online health
community called the Healthy Lifestyle Network. When a participant in the exper-
iment arrived at a web page of the Healthy Lifestyles, he or she was given overview
information on it. Then, if he or she agreed to join the network and signed up for it,
he or she was randomly assigned to one of the two networks.
In either network, a participant knew that he or she could interact with only a set
of neighbors, who were called “health buddies” in the experiment. In other words, a
participant does not know the whole structure of the network he or she was allocated
to. In the clustered network a participant shared overlapping contacts with other
health buddies, so he or she had wide bridges with them. In the random network, a
participant does not share such wide bridges with other health buddies.
Contagion began with the random choice of a participant. The participant sent a
message to his or her neighbors to encourage them to join a health forum website.
The neighbors decided whether to join the forum or not. If a neighbor joined the
forum, invitation messages were automatically sent from him or her to his or her
health buddies to invite them to join the forum.
94 Y. Sato

Joining the forum was not easy, however. A participant could not join the forum
only by clicking a “Join” button. Rather, he or she had to fill in answers to questions
on a registration form that was long enough for him or her to scroll down to
complete. Centola (2018) intentionally designed this form to make the contagion
in the experiment more difficult than a contagion of a virus. In other words, this task
of joining the forum is an appropriate condition to test the empirical validity of the
theory of complex contagions proposed by Centola and Macy (2007).
Results of the experiment clearly showed that diffusion in the clustered network
evolved faster than that in the random network and that the percentage of adopters
was higher in the clustered network than that in the random network. In addition,
carefully observing the process of contagion in the two networks, Centola (2018)
reported that invitation messages circulated in the same neighborhood in the clus-
tered network, which means that a participant was exposed to the invitation mes-
sages via more than one neighbor. In the random network, in contrast, the messages
quickly diffused in the network, but, because of the lack of redundancy, the diffusion
did not evolve so fast as in the clustered network, and the percentage of adopters was
lower than that in the clustered network. These results empirically support the theory
of complex contagions.

6.3 Role of Meaning and Interpretation in the Contagion


Process

Although it has advanced the study of contagion, I would argue that the theory of
complex contagions does not fully explain a contagion process among individuals,
because it does not incorporate meaning and interpretation in the contagion process.
Centola and Macy (2007) and Centola (2018) proposed four mechanisms to explain
why complex contagions need multiple sources to occur: strategic complementarity,
credibility, legitimacy, and emotional contagion. Strategic complementarity means
that for an individual to participate in a risky, costly behavior such as participation in
collective action, he or she needs to know other people in his or her network have
already participated in it. Credibility means that for an individual to adopt a new
item, he or she needs more than one sources to believe that the item is credible.
Legitimacy seems related to strategic complementarity. If some people who are
strongly connected to an individual have participated in a risky, costly behavior such
as a demonstration, bystanders become more likely to accept it as legitimate, which
in turn encourages the individual to participate in it. Emotional contagion means that
emotions are exchanged and amplified in a collective action, and such emotional
contagions encourage an individual connected to close friends participating in it to
participate in it.
These mechanisms are sociologically plausible, but they miss interpretation and
meaning in the diffusion process (Goldberg & Stein, 2018). Because the theory of
associative diffusion by Goldberg and Stein was explained in Chap. 2, I revisit the
6 Sociological Meaning of Contagion 95

failed case of diffusion of a boiling water practice in a Peruvian village (Rogers,


2003) to show the importance of interpretation and meaning in diffusion studies.
Boiling water for drinking was crucial for the health of residents in a peasant
village because the water they drank was contaminated. Thus, Nelida, the local
health worker representing the Peruvian public health agency in the village,
conducted a two-year campaign to persuade villagers to boil water. However, the
campaign failed. Only 11 families out of 200 families in the village began to boil
water.
Why did the campaign fail even though boiling water was beneficial to the health
of the villagers and not risky to them? Rogers (2003) argues that the villagers
perceived boiling water as culturally inappropriate. They categorized foods, bever-
ages, and medicines into “hot” and “cold” types. The categorization was not related
to their actual temperatures. Rather, it was social constructed among villagers who
believed in the legitimacy of the categorization. Furthermore, it is socially connected
to illness. In general, sick persons should avoid extremely hot or cold types, and raw
water was categorized as a very cold type. Thus, only sick persons drank boiled
water because villagers thought that boiling water would make it not extremely cold.
Healthy persons, in contrast, did not drink boiled water, because they thought that
they did not need to drink it.
This failure case shows the importance of local culture and norms that dominate
the scheme of interpretation and meaning of people. If the villagers had not culturally
linked boiled water to illness, they would have accepted the custom of boiling water.
Although the four mechanisms for complex contagions to occur are convincing, the
theory of complex contagions does not seem to properly deal with interpretation and
meaning in the process of contagions. In other words, if it includes interpretation and
meaning in its logic, the theory of complex contagion will enhance its explanatory
power.

6.4 Big Data Analysis of Diffusions, Interpretation,


and Meaning

Before examining how to incorporate interpretation and meaning in big data analysis
in detail, let me quickly review their history in sociology to show their importance.
Mead (1934) was one of the founding fathers who introduced interpretation and
meaning in sociology. To summarize his profound theory in a very simple way, self
consists of I and Me. Me is expectations of others I accepts, and I reacts to the
expectations. Multiple selves smoothly interact with each other if the reactions are
not contradictory to the expectations. I do not think that Me is the expectations of
others as they are. Rather, I interprets the meaning of the expectations and reacts to
the interpreted expectations. Here we see the interaction between interpretations and
reactions.
96 Y. Sato

Berger and Luckmann (1966) took Mead’s theory a step further in sociology.
They proposed a theory that explains how reality is constructed by interactions of
actors. Reality, or social reality to be exact, does not exist without actors. Actors
interact with other actors, add meanings to their actions, and interpret them. If
actions and interpretations fit well together, reality emerges. In other words, reality
is socially constructed by actors involved. Then, actors interpret the reality as
objective social order and behave in accordance with it. Here again, we observe
the interaction between interpretations and actions in the creation of social reality, or
social order.
The theory of social construction of reality by Berger and Luckmann (1966)
influenced studies of social movements, mobilization processes in particular, which
are closely related to the theory of complex contagion by Centola and Macy (2007).
Resource mobilization theory (e.g., McCarthy & Zald, 1977) used to be a main
paradigm in the study of social movements. The theory argues that resources such as
social movement organizations and human and financial resources are necessary for
social movements to emerge and succeed. However, it was criticized because it did
not exactly explain how people were mobilized. To overcome this shortcoming of
the theory, D. A. Snow, the main figure in the development of a new theory called
frame theory, and his colleagues focused on how a social movement organization
aligns its frame with that of people the organization wants to mobilize (Snow et al.,
1986). It is often the case that the frame of a social movement organization is
different from that of its target people in the beginning of a social movement. In
this case, even if the organization has plentiful resources for mobilization, the target
people do not participate in the movement. This is because the people do not
understand the meaning of and, therefore, the significance of the movement due to
the difference between their frame and that of the organization. Thus, the organiza-
tion tries to adjust its frame so that the target people would interpret the movement as
significant and related to their own interests. Here again, it becomes obvious that the
people’s interpretation of the adjusted frame and the movement is the key for the
organization to succeed in mobilizing its target people.
So far, I argued the importance of interpretation and meaning in sociology in
general and the study of social movements in particular. This is also the case when
we study the process of diffusions, as we observed in the failure of the boiling water
campaign in a Peruvian village. Then, how can we incorporate interpretation and
meaning in big data analysis of diffusion? This is a difficult task because most of the
big data is about behavior of people, and, therefore, big data analysis finds it difficult
to deal with interpretation and meaning. Of course, people express their opinions on
Twitter and Facebook, but we do not know how people interpret them and add
meaning to them by big data analysis.
How can we solve this problem? One possible solution would be applying topic
models to text data such as Twitter and Facebook (Bail, 2016). Chapter 2 cites
DiMaggio et al. (2013) who applied a topic model to newspaper articles to explain
the decline in the government support for artists between mid-1980s and mid-1990s.
Here I examine Bail’s (2016) work in detail because he and I share the same interest
in interpretation and meaning in big data analysis.
6 Sociological Meaning of Contagion 97

In general, people do not receive discourses expressed by other people as they are.
They interpret the discourses and add meaning to them using their cognitive schema
or frame. If the discourses and their cognitive frame are close to each other, people
tend to accept them. If not, they tend to refuse them.
Based on this theoretical idea, Bail (2016) studied how organ donation advocacy
organizations align their frames with those of their target population to induce people
in the target population to engage with them. He proposed a theory of cultural
carrying capacity. Organ donation advocacy organizations face a dilemma when
they use social media to attract more people. On the one hand, they need to cover
various topics in their messages so that more people would resonate with their frame.
To cite Bail’s example, an organization that produces messages discussing only
sports could attract only basketball fans, but one that spreads messages about sports,
religion, or science could induce not only sports fans but also religious people or
scientists to endorse its activities.
On the other hand, if an organization produces messages that cover too many
diverse topics, such diversification might limit its capacity to mobilize people. This
is because diversification creates disconnected audiences without collective identity,
which is necessary for mobilization.
Thus, theoretically, diversification of topics in messages has an inverted
U-shaped effect on the number of the mobilized people. If the level of diversification
is very low as in the case of messages whose topic is only about sports, the number of
the mobilized people should be small. Meanwhile, if the level of diversification is
very high, the target population becomes fragmented, and, therefore, the number of
the mobilized people should be small, too. Thus, the number of the mobilized people
should become the largest if the level of diversification is in the middle.
To check the empirical validity of this theoretical reasoning, Bail created an
innovative method. He created a Facebook application. If an organ donation advo-
cacy organization participates in the study and installs the application, it is provided
with useful information on and recommendations about its online outreach activities.
In return, the application collects all publicly available text from the organization’s
Facebook fan page and Insights data available only to the owner of the fan page, that
is, the organization. Then the application conducted a survey to the representative of
the organization to collect information on the organization and its outreach tactics.
Forty-two organizations participated in the study and produced 7252 messages
between October 1, 2011 and October 6, 2012. These messages are analyzed by
structural topic modeling. Topic modeling extracts latent topics from the messages
and calculates scores that show how strongly a message is associated with topics.
The scores are called membership scores. Structural topic modeling deals with meta-
data so that it could deal with temporal change in the meaning of a word or a group of
words vis-à-vis a topic.
Eventually, 39 topics were identified. Then Bail (2016) created an index that
shows the diversity of topics using the matrix of membership scores for each post in
the topics. He calls the index the coefficient of discursive variation. Mathematically,
the coefficient for each organization i at time t (Cit) is defined as follows:
98 Y. Sato

σt - 7
Cit =
μt - 7

“where σ is the standard deviation of the membership matrix of organization i’s posts
in the 39 topic categories during the previous week and μ is the mean score of the
posts across all topic categories during the same time period” (Bail, 2016, p. 284).
The coefficient of discursive variation looks like the coefficient of variation. If
organization i uses diverse topics in their posts, Cit becomes large. Thus, the
coefficient of discursive variation can be used to check the empirical validity of
the abovementioned theoretical argument. If the coefficient is very small or very
large, the number of mobilized people (the number of engaged Facebook users by
day in this case) should be small. If the coefficient is in the middle, the number of
mobilized people should be large.
To show that his theoretical argument is empirically valid, Bail (2016) conducted
a sophisticated type of regression modeling with the number of engaged Facebook
users by day as the dependent variable. The key independent variable is the coeffi-
cient of discursive variation. He also considered other theories proposing factors that
might affect the number of engaged Facebook users by day and included variables
derived from the theories in the models as control variables. Thus, if the coefficient
of discursive variation has an inverted U-shaped effect on the number of engaged
Facebook users by day after controlling for the variables derived from other theories,
his theoretical argument would be empirically supported.
A simple graph in Fig. 6.1 shows that the theoretical argument seems empirically
valid. To confirm the robustness of the graph, Bail conducted regression analysis,
and its results support the theoretical argument. The coefficient for the coefficient of
discursive diversity is positive, and that for the squared coefficient of discursive
diversity is negative after controlling for other variables. This means that the
coefficient of discursive diversity has an inverted U-shaped effect on the number
of engaged Facebook users by day as in Fig. 6.1.
The significance of Bail’s (2016) study is that it highlights the importance of
meaning and interpretation in mobilization. If an organ donation advocacy organi-
zation posts messages focusing on too few topics, its frame does not match frames of
the audience, so it cannot get attention from a wide range of audience. If it posts
messages about too many various topics, the audience interprets that its frame is
fragmented and contradictory, so it cannot appeal to a wide range of audience, either.
Only if it posts messages covering an adequate range of topics, the audience can
clearly interpret its frame as important to them and become mobilized.
In addition, his study explains why more than one source is necessary for
complex contagion to propagate. As abovementioned, Centola and Macy (2007)
and Centola (2018) proposed four mechanisms for complex contagion to occur:
Strategic complementarity, credibility, legitimacy, and emotional contagion. From
the viewpoint of frame analysis in mobilization, a mobilizing organization, an organ
donation advocacy organization in Bail’s study, aligns its frame with frames of its
target population through the four mechanisms. If the four mechanisms work well,
the organization finds it easier to resonate its frame with frames of the target
6 Sociological Meaning of Contagion 99

Fig. 6.1 Relationship between the coefficient of discursive variation and the number of engaged
Facebook users by day. Gray zone represents standard errors with 95% confidence interval.
(Source: Bail 2016, p. 286, Fig. 1)

population. Based on Bail’s argument, if a target person receives more than one
message with different topics from different source, his or her frame is more likely to
resonate with that of the organization. Here we can clearly understand why complex
contagion needs more than one source to propagate.
The theory of cultural carrying capacity also helps us clearly understand why the
campaign for boiling water in a Peruvian village failed, which was discussed in Sect.
6.3. Nelida, who was in charge of the campaign, failed to persuade the villagers to
boil water. This is because she did not resonate the frame of the Peruvian public
health agency with that of the villagers. If she had used topics in the persuasion
process that overlapped with topics the villagers were interested in, she could have
succeeded in persuading them to boil water.

6.5 Conclusion

The study of complex contagion by Centola and Macy (2007) and Centola (2018) is
a seminal work showing that a contagion of a cultural item is substantively different
from that of a virus, but it does not completely explore how meaning and interpre-
tation function in the process of complex contagion. Conversely, Bail (2016) does
100 Y. Sato

not talk about complex contagion, but he proposes a thought-provoking theory, the
theory of cultural carrying capacity, emphasizing the importance of meaning, inter-
pretation, and frame resonance when a movement organization tries to mobilize their
target population. He checked the theory’s empirical validity collecting and analyz-
ing big data, Facebook posts.
Combining studies by Centola and Macy (2007), Centola (2018), and Bail (2016)
gives us a deeper comprehension of the mechanism of complex contagion. However,
this is just an example showing that focusing on meaning and interpretation enriches
studies using big data and makes them more significant in sociology. Furthermore,
this research strategy would contribute to solve research questions sociologist has
attacked without big data and failed to find answers to them.

References

Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the sociology
of knowledge. Anchor Books.
Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton Univer-
sity Press.
Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113(3), 702–734.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41(6), 570–606.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
McCarthy, J. D., & Zald, M. N. (1977). Resource mobilization and social movements: A partial
theory. American Journal of Sociology, 82(6), 1212–1241.
Mead, G. H. (1934). Mind, self, and society: From the standpoint of a social behaviorist. University
of Chicago Press.
Milgram, S. (1967). The small-world problem. Psychology Today, 1(1), 61–67.
Rogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Snow, D. A., Rochford, E. B., Jr., Worden, S. K., & Benford, R. D. (1986). Frame alignment
processes, micromobilization, and movement participation. American Sociological Review,
51(4), 464–481.
Tarde, G. (1890). Les lois de l’imitation: Etude sociologique. Félix Alcan.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393,
440–442.
Wu, J. T., Leung, K., & Leung, G. M. (2020). Nowcasting and forecasting the potential domestic
and international spread of the 2019-nCoV outbreak originating in Wuhan, China: A modelling
study. Lancet, 395, 689–697.
Chapter 7
Polarization of Opinion

Zeyu Lyu, Kikuko Nagayoshi, and Hiroki Takikawa

7.1 Background

In recent years, there has been growing concern about that opinion polarization and
fragmentation appear to be increasingly pronounced over various issues. Specifi-
cally, previous research centered on the United States has evidenced that opinions on
several crucial issues have become increasingly divided and polarized since the
1960s (Hetherington, 2001; McCarty et al., 2006). Significant social and political
changes, including the civil rights movement, anti-war protests, and feminist, have
prompted individuals and organizations to take more pronounced positions on a
range of topics, leading to a general trend toward opinion polarization. Furthermore,
there is broad scholarly consensus that the degree of opinion polarization has
continuously increased since that and the ramifications of polarization may hold
greater significance in today’s social context.
The rise of opinion polarization has aroused extensive interest because its nega-
tive consequence can pose a disruptive threat to democratic societies. When indi-
viduals or groups are strongly polarized, they typically become less willing to
comprise with others and more prone to conflicts and even violence. This tendency
represents a severe societal risk by undermining the capacity to respond to pressing

Z. Lyu (✉)
Graduate School of Arts and Letters, Tohoku University, Sendai, Japan
e-mail: [email protected]
K. Nagayoshi
Institute of Social Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 101
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_7
102 Z. Lyu et al.

challenges such as inequality, pandemics, and violence. In light of these detrimental


effects, numerous endeavors have been embarked upon to enhance the understand-
ing of opinion polarization. However, systemic changes within society, such as the
proliferation of information sources, increasing diversity of values, and the escalat-
ing significance of online discourse, are transforming conventional formations and
dynamics of opinion, presenting new challenges to the study of opinion polarization.
Faced with challenges, the development of computational power, increasing avail-
ability of data science toolkits, as well as the increasing magnitude and plethora of
data have opened up new avenues for research of opinion polarization.
This chapter discusses the challenges and opportunities of incorporating novel
data and advanced computational methods in the study of opinion polarization.
While such interdisciplinary research is valuable for enriching and expanding the
scope of investigations, it also came up with challenges in how to grasp fitting
theories, adopt appropriate methodologies, and integrate schemas from multiple
disciplines. In order to better understand the cutting-edge and innovative works
occurring in this field, this chapter will (1) clarify the concept of opinion polarization
and distinguish its different variants for theoretical clarity, (2) provide an overview
of primary novel data and methods closely related to the investigation of opinion
polarization, (3) review several representative research to demonstrate how novel
data and advanced method can be applied to address both theoretical and practical
questions in research of opinion polarization, and (4) discuss how to integrate
theoretical concepts and empirical findings to establish an iterative research frame-
work of opinion polarization.

7.1.1 The Concept of Opinion Polarization

Opinion polarization is a multifaceted concept, encompassing numerous variants.


Clearly defining its concept is crucial for determining data collection methods,
choosing appropriate analytical approaches, and accurately interpreting results.
First, opinion polarization is commonly conceptualized as increases in antago-
nistic and extreme preferences over public policy, ideological orientations, partisan
attachments, and cultural norms. This form of polarization is typically
operationalized as a bimodal distribution of opinions. For example, considering
ideology that implies a general belief and ideas of politics and governance, it is
typical to locate individuals’ ideology in a continuum spectrum from a liberal
position to a conservative position. Ideology is considered polarized when there
are less centrist and more sharply conservative or liberal in the population. Likewise,
observing opinions on the spectrum shifting towards opposite and more extreme
positions indicates a form of polarization related to specific opinion.
Second, the concept “affective polarization” is introduced to describe polarization
through the disparities in the warmth that people feel toward out-groups versus
in-groups (Garrett et al., 2019; Iyengar et al., 2012, 2019; Rogowski & Sutherland,
2016). Affective polarization stems from social group identity theory, which
7 Polarization of Opinion 103

suggests that group membership can trigger more positive emotional reactions
toward the in-group than the out-group and a greater willingness to cooperate with
members of the in-group (Iyengar et al., 2012; Tajfel, 1982). For instance, with
regard to politics, as a social identity, partisanship contributes to bipolarity and the
favoring of people with similar political views while strongly biased against those
with opposing ones. Thus, increasing affective polarization can be characterized by
negative feelings toward opposing political parties or their supporters and positive
feelings toward one’s preferred political party.

7.2 The Mechanism of Opinion Polarization

Theoretical thinking also involves the generated hypotheses that should be empiri-
cally examined. In the discussion about opinion polarization, there are broadly two
mechanisms proposed for explaining its emergency and increasing.
On the one hand, opinion polarization is assumed to be associated with the
homophilic feature in interactions and connections among individuals. More specif-
ically, in order to reduce cognitive dissonance, individuals tend to seek out infor-
mation that confirms those beliefs while ignoring or dismissing information that
contradicts them (Stanley et al., 2020), as a result, individuals would increasingly
interact with those who share their views and are exposed to homophilic information
while distancing themselves from interactions with those who hold divergent opin-
ions and avoiding exposure to opposing viewpoints. In this way, since homophilic
interactions are deemed to reinforce the existing opinion, it often leads to individuals
taking more extreme positions (Sunstein, 2002).
On the other hand, group-justifying biases can also contribute to opinion polar-
ization. Individual’s group identity constitutes the psychological attachment to a
particular group, which is considered as one of the most crucial factors that affect
their opinions. Once identified with a group member, in order to enhance the self-
categorization as a good group member, self-categorization to a group can induce
individuals to adopt various forms of motivated reasoning, including in-group
favoritism and out-group derogation (Tajfel, 1982), as a way to maintain group
distinctiveness and advance their in-group member status. In this term, such
in-group/out-group distinction can exacerbate the extent of polarization.

7.2.1 Computational Social Science and Opinion Polarization

The challenge of disentangling the underlying mechanism of opinion polarization


and its effect has grown with the complexity of social changes, including the
increasing diversity of identities, inequality, and new information environment.
Specifically, although considerable efforts have been devoted to investigating the
extent, dynamic, cause, and consequence of opinion polarization, faced with a
104 Z. Lyu et al.

shifting society composed of new technologies and issues that enable new identities,
forms, and practices, an updated understanding of opinion polarization is needed. Up
until recently, most quantitative social science research has been limited to the
conventional statistical analysis of survey data. However, survey-based research
relies on the sampling of observations, which often encounters limitations in term
of representativeness, scale, and granularity. The limitation of the traditional method
is becoming rather apparent in the investigation of opinion polarization as the
mechanism and process of opinion polarization involve a seemingly endless number
of information flows, expressed opinions, and interrelated behaviors among different
actors across long-term time spans, which is difficult to be addressed using the
survey taken in small groups.
Fortunately, the developments of toolsets and computational capacities offer
significant potential for a better investigate and understand human behaviors and
opinions. In recent years, an unprecedented amount of digital data alongside a
variety of computational methods have illuminated new pathways to overcome the
limitations of previous social science research, thereby birthing a novel field known
as computational social science (Edelmann et al., 2020; Hofman, 2021; Lazer et al.,
2009). Broadly speaking, the primary features of the computational social science
research paradigm are concerned with collecting, manipulating, and managing big
data, utilizing advanced computational methods to improve the understanding of
human behaviors and social processes. These features bring up strong expectations
with regard to the promises they hold for the study of opinion polarization.
First, big data typically encompasses comprehensive and multivariate informa-
tion about the behaviors of large populations. The primary strength of big data lies in
its remarkable scalability and detailed granularity. This allows researchers to mon-
itor human behaviors with high frequency and on a large scale. Such capabilities are
invaluable for creating innovative measures of public opinion and for closely
examining interrelated human behaviors and social phenomena. From this perspec-
tive, the characteristics of big data provide researchers with the tools to explore a
broad spectrum of opinions and behavioral dynamics on a substantial scale, which
could be instrumental in revealing new aspects and fundamental mechanisms driving
the phenomenon of opinion polarization.
Also, computational social science has the potential to revolutionize the research
on opinion polarization through the introduction of advanced computational
methods and new techniques. A wide range of computational methods, such as
network analysis, natural language processing (NLP), and machine learning have
been applied to investigate the status and mechanism of opinion polarization. These
approaches provide detailed insights into the formation, dissemination, and evolu-
tion of opinions in various contexts, reaching a level of scale and depth unattainable
with conventional methods. Beyond that, the experimental approach, traditionally
used to explore the causality of opinion polarization, has undergone significant
evolution due to recent technological advancements. Specifically, through
conducting experiments with thousands of participants in the online discourse,
digital experiments allow observation of a sizeable number and heterogenous indi-
viduals in natural settings, which affords great promise to overcome the limitations
7 Polarization of Opinion 105

inherent in traditional laboratory experiment design. During the experiment, by


manipulating interventions and conditions in theoretically informed ways, it can
be determined which mechanism produces specific outcomes, answering previously
hard-to-tackle questions about the causality of opinion polarization. In general,
social scientists can strategically adopt computational tools to unpack the underlying
mechanism of opinion polarization.
In summary, it is reasonable to argue that the dual growth in the availability of big
data and the power of computational methods is becoming increasingly relevant to
the investigation of opinion polarization. Importantly, inconsistent conclusions
concerning the mechanism of opinion polarization indicate its complexity and
heterogeneity, thus single particular mechanism might be inconclusive in explaining
opinion polarization. Rather, to adequately understand opinion polarization, this
may require more consideration on the variations and randomness of its underlying
mechanism. A proper explanation of opinion polarization is achieved by specifying
the actors and contexts, then explicitly demonstrating how these conditions combine
and interact to produce the occurrence and changes of opinion polarization. From
this perspective, the massive volume of data combined with advanced computational
tools provided unprecedented opportunities to consolidate, develop, and extend
theories of opinion polarization. This includes accessing the dynamics of opinion
and its associations with fine-grained behaviors in real social environments that can
provide potential research insights, taking a consistency check to confirm that
theoretical mechanism indeed explains the opinion polarization as hypothesized,
and uncovering the new perspective and questions of opinion polarization to guide
further research. In this term, we can refine the concepts and measurements of
opinion polarization iteratively through these sequential and inductive processes.

7.3 New Methodology and Data for Opinion Polarization


Research

7.3.1 Network Analysis

Generally speaking, network analysis provides useful concepts, notions, and applied
tool to describe and understand how actors are connected. In a network, nodes
typically represent actors or institutions, whereas edges represent connections
between such entities. Over the recent decade, there has been increasing attention
devoted to the applications of network analysis methods in social science. Much of
this interest stems from the flexibility of social networks to define and model various
relationships among social entities. Social networks consisting of actors and social
relations are ubiquitous in society. For instance, people connected by the common
interest, citizens connected by the common supported party, or social media users
connected by the interactions in the platform. Importantly, the structure and dynam-
ics of these connections have the potential to yield meaningful insights into a variety
106 Z. Lyu et al.

of social science problems, thus it has become an established field within the social
sciences and has become an indispensable resource for understanding human behav-
iors and opinions (Borgatti et al., 2009; Watts, 2004).
Previously, since collecting relational data through direct contact is time-
consuming and difficult, social network analysis was typically restricted to small
bounded groups. Thanks to the development of information technology, recent years
have witnessed an explosion in the availability of networked data. Especially, the
rapid increase in the use of social media has generated time-stamped digital records
of social interactions. These digital records have reinvigorated social network
analysis by enabling analyses of relations among social entities with unprecedented
scale in real-time, which has also opened up new opportunities to investigate opinion
polarization on social media.
First, network analysis can provide insight into the nature and dynamics of
opinion polarization by serving as a method to detect individuals’ opinions. As
indicated above, opinion polarization involves the degree to which people hold
competing attitudes on the specific issues. Here, one of the crucial questions refers
to how to quantify individuals’ opinion. From the perspective of network analysis,
the basic idea is to assume individuals are embedded in social relations and interac-
tions with measurable representations of patterns. Specifically, one of the most
important characteristics of the social network is the homophily principle, which
implies that people’s social networks are homogeneous with regard to
sociodemographic characteristics, behaviors, and opinions (McPherson et al.,
2001). Accordingly, it is reasonable to assume that people who have similar opinion
leanings are more likely to share a homophily social network, accordingly, peoples’
social network structure is assumed to be associated with their opinions. Based on
this theoretical assumption, the availability of network data among social media
users has elicited many efforts to estimate opinions with network-based measure-
ment. Specifically, interactions on social media can be naturally described as the
social network in which individuals with shared interests tend to form groups and
individuals within the same community likely share a similar opinion. For example,
many social media platforms, including Twitter and Facebook, allow users freely
choose whom to follow or not follow. These interactions have potential to serve as a
source of opinion detection. Barberá (2015) introduced a systematical framework to
estimate individuals’ ideology based on following relationships with politicians.
According to the homophily principle, it is reasonable to assume that users following
the same politicians tend to share similar political opinions as follow network reflects
their political preference. Computationally, the following relationship can be
aggerated as an adjacent matrix that reflects the following relationship among
politicians and ordinary users. Then, the dimension reduction algorithms, such as
singular value decomposition, can be applied to map these following relationships
into low-dimension ideological space. In this term, each node in the following
network can be attributed with an estimated ideology based on their network
structure. Beyond the following relationship, other interaction behaviors and con-
nections on social media, such as “like,” “retweet,” and “reply” have also been
proven to be associated with individuals’ latent ideology (Bond & Messing, 2015;
7 Polarization of Opinion 107

Wong et al., 2016). Furthermore, recent studies attempt to employ more sophisti-
cated methods that can integrate different types of information attached to nodes and
edges in the network to predict ideology. Graph neural networks (GNNs) method has
a powerful capability to handle abundant information with edges among multi-type
nodes and attributes associated with each node (Zhang et al., 2019), making it
suitable to capture nuanced patterns and relationship. For example, utilizing
GNNs, multiple relations on Twitter, including follow, retweet, like, and mention
can be aggerated as input for the deep learning model of ideology detection (Xiao
et al., 2020).
Second, another stream of research has employed network analysis to investigate
opinion polarization from the perspective of homophily interactions. Selective
exposure and echo chamber are widely used to describe a particular situation in
which people are only exposed to information and ideas that reinforce their existing
opinions, thereby creating a self-reinforcing cycle that reinforces those opinions
(Garrett, 2009; Prior, 2013; Stroud, 2010; Wojcieszak, 2010). Notably, they are
assumed to diminish mutual understanding, narrow the diversity of exposed view-
points, and ultimately lead to a situation where people have less common ground and
feel animosity toward those who hold opposing views, that is, opinion polarization.
From this perspective, how individuals interact with others, and how individuals are
exposed to information flow, could provide important insights for the underlying
mechanism of opinion polarization. A growing body of research has suggested the
homophily tendency in the connections and interactions on social media. Beyond
that, the availability of social network data enables researchers to investigate con-
nections and interactions from more diverse and nuanced perspectives. For example,
Conover et al. (2011) employed clustering algorithms to investigate the political
communication network on Twitter, demonstrating that the retweet network exhibits
two highly segregated communities of users, while mention network is much more
politically heterogeneous. Barberá et al. (2015) suggest that in political discussion,
information was exchanged primarily among individuals with similar ideological
preferences, yet such homophily tendency is much weaker in discussion related to
other issues. Bakshy et al. (2015) examine how millions of Facebook users interact
with socially shared news. The analysis suggests homophily tendency is more
pronounced in shared links of hard content such as national news, politics, or
world affairs. These researches indicate that the “echo chamber” narrative might
be overstated as such a tendency appears to be more pronounced in the specific
context.
Beyond that, network analysis can also furnish the investigation of affective
polarization. Affective polarization refers to the gap in feelings between in- and
out-group members. Many social media platforms allow users to express their
attitudes toward the posts or other users through functions such as “like” and
emotional reactions. The expressed sentiment and opinions in interactions inspire
the investigation of affective polarization based on network analysis. For instance,
Rathje et al. (2021) empirically indicated that contents related to the out-groups are
more likely to elicit negative emotional reactions such as “angry,” while contents
related to the in-groups are more likely to elicit positive emotional reactions such as
108 Z. Lyu et al.

“love.” Brady et al. (2017) find that messages revealing negative emotions toward
rival political candidates are more likely to be spread within liberal or conservative
in-group social networks. Marchal (2022) suggests that negative sentiment is sig-
nificantly more salient in interactions between crosscutting users rather than like-
minded users.
To summarize, the combination of network analysis methods and big data allows
us to formalize diverse patterns of social networks and investigate their characteris-
tics from more comprehensive and nuanced perspectives. Particularly, it enhances
our understanding of how interaction patterns, related issues, and the actors involved
contribute to the degree of opinion polarization. These implications are crucial in
deepening our understanding on the states, mechanisms, and consequences of
opinion polarization.

7.3.2 Natural Language Processing

Texts and words are integral parts of society. People typically use texts and words to
express, make proposals, and communicate with each other. These texts and words
can serve as crucial source that reflects individuals’ beliefs, attitudes, and opinions.
In particular, the development of digitalization is generating an unprecedented
volume of textual data that can be used to investigate opinions. Also, many efforts
have been devoted to making textual data easier to acquire. For instance, many
governments have established digital data archives of policy documents, congres-
sional records, and reports, social media platforms such as Twitter provide applica-
tion user interfaces (APIs) that allow researchers to access and make use of generated
textual data of users.
Despite its great potential, text analysis has always been a difficult task because
human language is complex and nuanced. Especially, increasing availability of these
new data sources came up with demands for advanced techniques to handle the
scalability and complexity of the text. Investigation of opinion based on textual data
requires a method to describe how opinions can be measured and quantified.
Traditionally, content analysis of opinion revealed in texts has been based on
hand-coding by coders, which involves a series of processes such as developing
coding schema and training coders. These processes are usually time-consuming,
especially, as the scale of textual data gets larger, it becomes harder to process a large
amount of text relying on hand-coding. Fortunately, the advent of computational
methods and ever-increasing computing power has substantially refined the way for
further research in this direction. Compared to the traditional approaches, the
development of NLP techniques now provides a broad spectrum of advanced tools
for analyzing large-scale textual data more efficiently (Wilkerson & Casas, 2017).
Motivated by the advent of computation techniques and the increasing availabil-
ity of textual data, there has been a growing interest in capturing the states and
dynamics of opinion polarization through the automatic detection of opinions from
large textual data.
7 Polarization of Opinion 109

Automatic text analysis method requires a model describing the pattern and
structure of texts in a computational way. Typically, the model is designed for
transferring the unstructured textual data into the “structural data” (i.e., numerical
representation) that can be further analyzed by various computational methods,
which is known as “feature extraction” in NLP.
The most common strategy of feature extraction refers to the bag-of-words
(BoW) model, which describes the occurrence of words within a document while
disregarding the grammatical details and the word order. For the detection of
opinions, tokens are usually matched with a list of words that have been previously
annotated as opinion-related terms. For example, Laver and Garry (2000) developed
a dictionary to define how a series of words related to the specific content categories
and political party. The content of a text can be automatically identified by matching
words to their respective categories in a dictionary. Laver et al. (2003) developed the
Wordscores method that uses the reference text with the annotated political place-
ment to replace the dictionary. More specifically, Wordscores assumes that the
revealed political opinion of the new texts can be derived from the similarity to
the reference texts based on the word frequency. Therefore, word frequency infor-
mation from “reference” texts with the annotated ideological positions can be used to
make predictions for new texts for which the positions are unknown. The BoW
model, while straightforward and manageable, has several inherent limitations. First,
it views words as individual entities with distinct meanings, often overlooking key
aspects like grammatical structure and word order in texts. Second, encompassing all
words that occur in a pre-encoded dictionary can be challenging, leading to the
exclusion of significant details in text analysis and possibly resulting in bias.
Additionally, extensive human intervention is often required to choose relevant
words or reference texts for assigning meanings to each text. In particular, the
coding scheme is typically compatible only with specific texts, which limits the
methods’ generalizability. Consequently, the development and maintenance of dic-
tionaries tend to be time-consuming and labor-intensive.
More recently, social scientists have adopted more sophisticated methods to
improve the efficacy and accuracy of text opinion scaling. Specifically, to overcome
limitations inherent in the BoW model, word embedding models have been applied
for estimating opinions revealed from texts. Broadly speaking, word embedding
attempts to encode each word into a dense low-dimensional vector, where proximity
between two vectors indicates greater semantic similarity between the associated
words. To achieve this aim, word embedding model assumes that the semantic
meaning of words can be inferred based on the words that appear frequently in a
small context window around the focal word. There are various approaches and
architectures of word embedding, such as Word2Vec (Mikolov et al., 2013) and
GloVe (Pennington et al., 2014), to define the context and the semantic meaning of
words in different ways, while they commonly capture context, relation, and seman-
tic similarity of words in the texts. Therefore, using word embedding to represent
text as a vector can preserve more information in the raw text for opinion detection.
Moreover, word embeddings require less human intervention since they can be
efficiently trained based on the large, preexisting, and unannotated corpora. With
110 Z. Lyu et al.

this great advantage, there is immense promise with word embedding models in text
as data research and several studies have leveraged it as an alternative way to capture
the opinions of individuals and organizations. Rheault and Cochrane (2020)
employed parliamentary corpora and augmented it with input variables reflecting
party affiliations to train the “party embedding.” Since the word embedding model is
powerful at capturing the pattern and characteristics of text to build better feature
representations, party embedding can reflect similarities and differences in ideolog-
ical positions and policy preferences among parties. Notably, word embedding can
be automatically trained and easily adapted to new tasks or domains based on new
data. For example, parliamentary corpora of different countries enable the compar-
ison of opinion polarization across counties, historical textual data enable the
investigation of opinion polarization over time. Furthermore, word embedding
model can be flexibly and efficiently applied to various types of corpus. For
example, it is reasonable to assume that published posts and profile descriptions
can reflect the opinions of a social media user, thus these textual data can be
aggregated at the individual-level to produce word embedding representation
reflecting individuals’ opinions (Jiang et al., 2022; Preotiuc-Pietro et al., 2017).

7.4 Digital Experiment

Experiment is one of the most important methodologies that has been widely applied
in social science to theoretical hypotheses and establish causality. The main advan-
tage of experiments is the possibility to tightly control the setup of experiment
condition to systematically estimate the effect of a stimulus or condition. However,
conventional experiments were typically conducted in offline laboratories or in terms
of surveys among relatively small populations. Especially, considering the highly
controlled settings and limited diversity of participants, it becomes harder to estab-
lish the reliability and generality of experimental insights in real-life situations
outside the laboratories and specific populations.
In recent years, with the availability of digital platforms and tools, many
researchers have begun utilizing the Internet as a novel intermediary to recruit
participants and conduct experiments. The digital experiment represents several
advantages that can not only help scholars to conduct more scalable, flexible, and
efficient experiments but also provide useful tools to inspire new strategies of
experiments.
First, compared to the traditional offline experiment, the digital experiment over-
comes the limitation of space and time, which can facilitate a more sizable and
heterogenous recruitment of participants to improve the external validity of exper-
imental research (Peng et al., 2019).
Second, digital techniques enable the collection of real-time, high granularity, and
fine-grained behavioral data of participants. These digital trace data can not only
provide additional information for measuring the intensity of the treatment effect and
temporal changes of behaviors but also allow the check of compliance with
7 Polarization of Opinion 111

treatments to enhance the validity of experiments (Guess et al., 2021; Stier et al.,
2020). Connected to this, digital experiment is more flexible for a long-term design,
allowing scholar to assess change in individual attitudes and behaviors over time.
Third, digital experiments can be conducted in natural settings to achieve full
ecological validity and to avoid demand effects (Mosleh et al., 2022). Since behav-
ioral tracking tools can collect real-world data unobtrusively in the naturalistic
environment, thus causal effects can be explicitly examined by observing actual
behaviors.
The power to detect causation has inspired research leverage digital experiments
to investigate the underlying mechanism of opinion polarization. In practice, it
would be beneficial to combine the digital experiment with other available data,
such as survey data and behavioral data, to examine the causality of opinion
polarization. Typically, participants are motivated to change their information expo-
sure in online discourse by changing their news feeds or social media following
patterns. In this term, causal effects of information exposure can be examined in
natural settings. Also, other key variables, such as political attitudes, policy prefer-
ences, and demographic information can be accurately estimated by the survey.
Moreover, digital trace data, such as participants’ social networks, generated con-
tent, and interactional behaviors in the online discourse can provide important
insights into how individuals’ opinions and interrelated behaviors change over
time. In this term, the combination of digital experiments and other data sources
can not only shed light on the causality of political polarization but also provide
insight into the cumulative effects of interventions through the investigation of fine-
grained behavioral data.
Bail et al. (2018) incentivized a large group of social media users to follow bots
that retweeted messages by elected officials and opinion leaders with opposing
political views. Through evaluating the impact of the treatment on participants’
opinions via surveys, the results indicate that exposure to opposing political views
may not mitigate opinion divergence and can even generate backfire effects that
intensify political polarization. Similarly, Casas et al. (2023) also focus on the effect
of exposure to dissimilar views on opinion polarization. In a longitudinal experi-
ment, participants were incentivized to read political articles of extreme opposing
views, then they rely on participants’ survey self-reports and their behavioral
browsing data to track over time changes in online exposure and attitudes. Guess
et al. (2021) incentivized participants to change their browser default settings and
social media following patterns to enhance the likelihood of encountering partisan
news. This research design incorporated a naturalistic nudge within an online panel
survey with linked digital trace data, providing significant insights into the long-term
consequences of heightened exposure to partisan information. Levy (2021) recruited
participants using Facebook ads, asking them to subscribe conservative or liberal
news outlets on Facebook. Together with behavioral tracking data, such as sharing
posts and liking pages, this study demonstrates how news exposure and the algo-
rithms of social media platforms affect users’ behaviors and attitudes.
Moreover, digital experiments have enriched the treatment strategy for investi-
gating the mechanism of opinion polarization from diverse perspectives. For
112 Z. Lyu et al.

example, Mosleh et al. (2021) implemented a field experiment leveraging the


platform features of Twitter. Twitter is infested with social bots that mimic human
behavior. They created human-like and identical-looking bot accounts with varied
self-identified political partisanship. Then, they randomly assigned Twitter users to
be followed by the bots, aiming to estimate the causal effect of political exposure on
users by observing if they tend to follow back accounts of like-minded partisanship.
Chen et al. (2021) conducted digital experiment to examine the biases stemming
both from Twitter’s system design and social interactions with other users. They
created bots that initially followed a popular news source with a specific partisan
bias, programmed to imitate social media users, and released them into the wild.
After 5 months, the generated and consumed content of these bots with varying
initial biases was compared, demonstrating how existing political leanings foster
echo chambers through sustained exposure to partisan and biased information flow
on social media.

7.5 Discussion

This chapter is intended to offer a comprehensive overview of the fundamental


principles and components of opinion polarization, illustrating how the intersection
of novel data and advanced computational techniques can propel the investigation of
its states, mechanism, and consequences.
First, new data and methods can solve numerous long-standing obstacles once
considered insurmountable. There is little doubt that the abundant availability of
fine-grained, temporal, and fairly detailed information can provide a more compre-
hensive understanding of opinion polarization. More specifically, computational
social science approaches enable us to observe and describe human behaviors and
social processes in a way that is not possible with small data, thereby presenting a
novel opportunity to validate and generalize classical theories from a fresh
perspective.
Second, new data and methods can develop new theories and raise new questions
about opinion polarization. Indeed, the development of digital techniques itself has
dramatically reshaped the way that people retrieve information, develop opinions,
and communicate with others (Jungherr et al., 2020), which may ultimately create
new forms of opinion polarization in online discourse. The increasingly important
role of computer-mediated communication underscores the need for more work on
the mechanism and consequence of opinion polarization in new, noisy, and complex
information environments (González-Bailón & Lelkes, 2023). Recent works
employing novel methods and data have indicated many established theories seem
not hold up to empirical scrutiny, indicating the necessity to further evaluate the
external validity of findings in existing studies. Therefore, future study should
iteratively examine the empirical findings and refine theories to validate, clarify,
and develop theoretical insights.
7 Polarization of Opinion 113

In summary, as a field that actively incorporates big data and computational


methods into social science research, research on opinion polarization has also
facilitated the formalization of a paradigm that integrates theories and empirical
findings. While social sciences are devoted to formalizing the underlying mechanism
and providing interpretative explanations for human behaviors and social changes.
Such a theory-driven paradigm of social science, however, has been criticized for
limited generalization and failing to offer solutions to real-world problems (Watts,
2017). The availability of big data and computational methods expedites the emer-
gence and development of data-driven social science research. Despite the great
potential of novel data and methods, it should be noted that exploratory detection is
still necessarily driven by suitable theories that are reflected in how data is collected,
what analysis method is adopted, and how a result is interpreted. Indeed, the review
of research on opinion polarization has highlighted how to establish an iterative and
inductive research framework through the integration of theories and empirical
findings. Theoretical clarity strongly influences what variants of opinion polarization
are investigated, what types of hypotheses should be examined, and what methods
are appropriate to investigate them. In turn, as indicated above, implications derived
from empirical findings can contribute to the theoretical grounding of opinion
polarization from various perspectives. Especially, the integration of theories and
empirical findings is beneficial to translate the abstract debate into precise interven-
tions to prevent the increasing opinion polarization and its negative impacts. Thus,
the most crucial implication is that we should leverage the momentum in the
methodological innovations and further connect it with appropriate theoretical
groundings, balancing theories and methods development by iteratively examining
data and refining theories.

References

Bail, C. A., et al. (2018). Exposure to opposing views on social media can increase political
polarization. Proceedings of the National Academy of Sciences, 115, 9216–9221.
Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and
opinion on Facebook. Science, 348(6239), 1130–1132. https://doi.org/10.1126/science.aaa1160
Barberá, P. (2015). Birds of the same feather tweet together: Bayesian ideal point estimation using
Twitter data. Political Analysis, 23(1), 76–91. https://doi.org/10.1093/pan/mpu011
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right:
Is online political communication more than an echo chamber? Psychological Science, 26(10),
1531–1542. https://doi.org/10.1177/0956797615594620
Bond, R., & Messing, S. (2015). Quantifying social media’s political space: Estimating ideology
from publicly revealed preferences on Facebook. American Political Science Review, 109(1),
62–78. https://doi.org/10.1017/s0003055414000525
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social
sciences. Science, 323(5916), 892–895. https://doi.org/10.1126/science.1165821
Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Bavel, J. J. V. (2017). Emotion shapes the
diffusion of moralized content in social networks. Proceedings of the National Academy of
Sciences, 114(28), 7313–7318. https://doi.org/10.1073/pnas.1618923114
114 Z. Lyu et al.

Casas, A., Menchen-Trevino, E., & Wojcieszak, M. (2023). Exposure to extremely partisan
news from the other political side shows scarce boomerang effects. Political Behavior, 45,
1491–1530.
Chen, W., Pacheco, D., Yang, K.-C., & Menczer, F. (2021). Neutral bots probe political bias on
social media. Nature Communications, 12, 5580.
Conover, M., Ratkiewicz, J., Francisco, M., Goncalves, B., Menczer, F., & Flammini, A. (2011).
Political polarization on Twitter. Proceedings of the International AAAI Conference on Web and
Social Media, 5, 89–96. https://doi.org/10.1609/icwsm.v5i1.14126
Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and
sociology. Annual Review of Sociology, 46(1), 61–81. https://doi.org/10.1146/annurev-soc-
121919-054621
Garrett, R. K. (2009). Echo chambers online? Politically motivated selective exposure among
Internet news users. Journal of Computer-Mediated Communication, 14(2), 265–285. https://
doi.org/10.1111/j.1083-6101.2009.01440.x
Garrett, R. K., Long, J. A., & Jeong, M. S. (2019). From partisan media to misperception: Affective
polarization as mediator. Journal of Communication, 69(5), 490–512. https://doi.org/10.1093/
joc/jqz028
González-Bailón, S., & Lelkes, Y. (2023). Do social media undermine social cohesion? A critical
review. Social Issues and Policy Review, 17, 155–180. https://doi.org/10.1111/sipr.12091
Guess, A. M., Barberá, P., Munzert, S., & Yang, J. (2021). The consequences of online partisan
media. Proceedings of the National Academy of Sciences, 118(14), e2013464118. https://doi.
org/10.1073/pnas.2013464118
Hetherington, M. J. (2001). Resurgent mass partisanship: The role of elite polarization. The
American Political Science Review, 95(3), 619.
Hofman, J. M. (2021). Integrating explanation and prediction in computational social science.
Nature, 595(7866), 181–188. https://doi.org/10.1038/s41586-021-03659-0
Iyengar, S., Sood, G., & Lelkes, Y. (2012). Affect, not ideology: A social identity perspective on
polarization. Public Opinion Quarterly, 76(3), 405–431. https://doi.org/10.1093/poq/nfs038
Iyengar, S., Lelkes, Y., Levendusky, M., Malhotra, N., & Westwood, S. J. (2019). The origins and
consequences of affective polarization in the United States. Annual Review of Political Science,
22(1), 129–146. https://doi.org/10.1146/annurev-polisci-051117-073034
Jiang, J., Ren, X., & Ferrara, E. (2022). Retweet-BERT: Political leaning detection using language
features and information diffusion on social networks. https://doi.org/10.48550/arxiv.2207.
08349
Jungherr, A., Rivero, G., & Gayo-Avello, D. (2020). Retooling politics: How digital media are
shaping democracy. Cambridge University Press. https://doi.org/10.1017/9781108297820
Laver, M., & Garry, J. (2000). Estimating policy positions from political texts. American Journal of
Political Science, 44(3), 619. https://doi.org/10.2307/2669268
Laver, M., Benoit, K., & Garry, J. (2003). Extracting policy positions from political texts using
words as data. American Political Science Review, 97(2), 311–331. https://doi.org/10.1017/
s0003055403000698
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A.-L., Brewer, D., Christakis, N., Contrac-
tor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., & Alstyne, M. V.
(2009). Computational social science. Science, 323(5915), 721–723. https://doi.org/10.1126/
science.1167742
Levy, R. (2021). Social media, news consumption, and polarization: Evidence from a field
experiment. American Economic Review, 111(3), 831–870. https://doi.org/10.1257/aer.
20191777
Marchal, N. (2022). Be nice or leave me alone: An intergroup perspective on affective polarization
in online political discussions. Commun Res, 49, 376–398.
McCarty, N., Poole, K. T., & Rosenthal, H. (2006). Polarized America: The dance of ideology and
unequal riches. MIT Press.
7 Polarization of Opinion 115

McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social
networks. Annual Review of Sociology, 27(1), 415–444. https://doi.org/10.1146/annurev.soc.27.
1.415
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations
of words and phrases and their compositionality. Advances in Neural Information Processing
Systems, 26, 1081–1088.
Mosleh, M., Martel, C., Eckles, D., & Rand, D. G. (2021). Shared partisanship dramatically
increases social tie formation in a Twitter field experiment. Proceedings of the National
Academy of Sciences of the United States of America, 118(7), 9–11. https://doi.org/10.1073/
pnas.2022761118
Mosleh, M., Pennycook, G., & Rand, D. G. (2022). Field experiments on social media. Current
Directions in Psychological Science, 31(1), 69–75. https://doi.org/10.1177/
09637214211054761
Peng, T.-Q., Liang, H., & Zhu, J. J. H. (2019). Introducing computational social science for Asia-
Pacific communication research. Asian Journal of Communication, 29(3), 205–216. https://doi.
org/10.1080/01292986.2019.1602911
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation.
In Proceedings of the 2014 conference on empirical methods in natural language processing
(EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/d14-1162.
Preotiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political
ideology prediction of Twitter users. In Proceedings of the 55th annual meeting of the
Association for Computational Linguistics (Volume 1: Long papers) (pp. 729–740). https://
doi.org/10.18653/v1/p17-1068
Prior, M. (2013). Media and political polarization. Annual Review of Political Science, 16(1),
101–127. https://doi.org/10.1146/annurev-polisci-100711-135242
Rathje, S., Bavel, J. J. V., & van der Linden, S. (2021). Out-group animosity drives engagement on
social media. Proceedings of the National Academy of Sciences, 118(26), e2024292118. https://
doi.org/10.1073/pnas.2024292118
Rheault, L., & Cochrane, C. (2020). Word embeddings for the analysis of ideological placement in
parliamentary corpora. Political Analysis, 28(1), 112–133. https://doi.org/10.1017/pan.2019.26
Rogowski, J. C., & Sutherland, J. L. (2016). How ideology fuels affective polarization. Political
Behavior, 38(2), 485–508. https://doi.org/10.1007/s11109-015-9323-7
Stanley, M. L., Henne, P., Yang, B. W., & Brigard, F. D. (2020). Resistance to position change,
motivated reasoning, and polarization. Political Behavior, 42(3), 891–913. https://doi.org/10.
1007/s11109-019-09526-z
Stier, S., Breuer, J., Siegers, P., & Thorson, K. (2020). Integrating survey data and digital trace data:
Key issues in developing an emerging field. Social Science Computer Review, 38(5), 503–516.
https://doi.org/10.1177/0894439319843669
Stroud, N. J. (2010). Polarization and partisan selective exposure. Journal of Communication,
60(3), 556–576. https://doi.org/10.1111/j.1460-2466.2010.01497.x
Sunstein, C. R. (2002). The law of group polarization. Journal of Political Philosophy, 10(2),
175–195. https://doi.org/10.1111/1467-9760.00148
Tajfel, H. (1982). Social psychology of intergroup relations. Annual Review of Psychology, 33(1),
1–39. https://doi.org/10.1146/annurev.ps.33.020182.000245
Watts, D. J. (2004). The “new” science of networks. Annual Review of Sociology, 30(1), 243–270.
https://doi.org/10.1146/annurev.soc.30.020404.104342
Watts, D. J. (2017). Should social science be more solution-oriented? Nature Human Behaviour,
1(1), 0015. https://doi.org/10.1038/s41562-016-0015
Wilkerson, J., & Casas, A. (2017). Large-scale computerized text analysis in political science:
Opportunities and challenges. Annual Review of Political Science, 20(1), 529–544. https://doi.
org/10.1146/annurev-polisci-052615-025542
116 Z. Lyu et al.

Wojcieszak, M. (2010). ‘Don’t talk to me’: Effects of ideologically homogeneous online groups and
politically dissimilar offline ties on extremism. New Media & Society, 12(4), 637–655. https://
doi.org/10.1177/1461444809342775
Wong, F. M. F., Tan, C. W., Sen, S., & Chiang, M. (2016). Quantifying political leaning from
tweets, retweets, and retweeters. IEEE Transactions on Knowledge and Data Engineering,
28(8), 2158–2172. https://doi.org/10.1109/tkde.2016.2553667
Xiao, Z., Song, W., Xu, H., Ren, Z., & Sun, Y. (2020). TIMME: Twitter ideology-detection via
multi-task multi-relational embedding. In Proceedings of the 26th ACM SIGKDD international
conference on knowledge discovery & data mining (pp. 2258–2268). https://doi.org/10.1145/
3394486.3403275
Zhang, C., Song, D., Huang, C., Swami, A., & Chawla, N. V. (2019). Heterogeneous graph neural
network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge
discovery & Data Mining (pp. 793–803). https://doi.org/10.1145/3292500.3330961.
Chapter 8
Coda

Yoshimichi Sato and Hiroki Takikawa

8.1 Revisiting the Relationship Between Computational


Social Science and Sociology

We explored in previous chapters how computational social science and sociology


should collaborate to advance both disciplines. One way, which was emphasized in
Chaps. 2 and 6, is to properly incorporate meaning and interpretation in computa-
tional social science. These two concepts have been one of the central themes in
sociology, while computational social science mainly analyzes behavioral data such
as mobile data collected via GPS devices in smart phones. It is true that computa-
tional social science deals with text data such as X posts (tweets), but it is rare to
study how people reading X posts (tweets) interpret them and add meaning to them.
Chapter 2 scrutinized previous literature in computational social science such as
Goldberg and Stein (2018), Boero et al. (2004a, b, 2008), Sato (2017), and
DiMaggio et al. (2013) to show how to incorporate meaning and interpretation in
the studies of agent-based modeling and digital data analysis. Chapter 6, focusing on
diffusion or contagion, carefully examined the theory of cultural carrying capacity
proposed by Bail (2016) to show that the theory is a milestone for including meaning
and interpretation in the analysis of big data and suggested that incorporating the
theory in the study of complex contagion proposed by Centola and Macy (2007) and
Centola (2018) would enrich the study.

Y. Sato (✉)
Faculty of Humanities, Kyoto University of Advanced Science, Kyoto, Japan
e-mail: [email protected]
H. Takikawa
Graduate School of Humanities and Sociology, The University of Tokyo, Bunkyo-ku, Tokyo,
Japan
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 117
Y. Sato, H. Takikawa (eds.), Sociological Foundations of Computational Social
Science, Translational Systems Sciences 40,
https://doi.org/10.1007/978-981-99-9432-8_8
118 Y. Sato and H. Takikawa

Another way to promote collaboration between computational social science and


sociology is to apply techniques of the former to the latter, which was discussed
mainly in Chaps. 3 and 5. The chapters reinterpret the potentiality and possibility of
computational social science so that it would contribute to the advancement of
sociology. This reinterpretation is worth discussing in detail, so let us discuss it in
the next sections.

8.2 Beyond the Deductive Approach

Chapter 3 argues that computational social science is free from the deductive
approach. This is an important point in the scientific methodology. In conventional
scientific activities including sociology, scientists start with a theory, derive hypoth-
eses from it, set up an experiment or conduct a social survey to check their empirical
validity. If their empirical validity is confirmed, the theory is thought to survive the
empirical test. If not, scientists revise the theory or invent a new theory to make it
more empirically valid. Thus, scientific activities gradually advance scientific
knowledge (see Popper, 1959). This is a typical image of the deductive approach.
Some of computational social science can also follow the deductive approach.
Big data analysis, for example, is useful to check the empirical validity of hypotheses
derived from a theory. However, computational social science could go beyond it
and contribute to creating a new theory in sociology.
It is true that computational social science provided social scientists with social
telescopes (Golder & Macy, 2014) with much higher resolution than conventional
sociological research. However, from the viewpoint of the deductive approach, we
need a theory before observing social universe. Without a theory we could not know
which area in social universe we should observe. For example, it does not make
sense to only observe mobility patterns collected via smart phones after the lock-
down caused by COVID-19. This is because it is unclear why we need to observe the
patterns. However, if we have a theory of differential effects of the lockdown on
people of different class and ethnicity, we will combine data of such social charac-
teristics and mobility patterns and analyze how the lockdown increased or decreased
social inequality (see Chang et al., 2021). In a sense, a sociological study using
computational social science should start with a theory if it follows the deductive
approach.
However, computational social science could contribute to the advancement of
sociology without the deductive approach as Chap. 3 argues. Big data enables social
telescopes to cover much larger areas than conventional sociological data. Thus,
computational social science has possibilities to find a new theory by searching a
wide range of social universe.
Machine learning, which is focused on in Chap. 3, has this potential. For
example, unsupervised machine learning can extract the latent structure of observed
data, which could not be extracted by researchers analyzing the data. This means that
unsupervised machine learning has potential to find new patterns that might lead to a
8 Coda 119

new theory. Being free from the deductive approach, machine learning,
unsupervised machine learning in particular, opened a door to a new way of
exploring new theories in sociology.
Machine learning has another advantage. Because it is more flexible about
modeling than conventional statistical models, machine learning proposes better
predictions. When we use conventional statistical models such as regression models,
we derive hypotheses from a theory, build models based on them, and estimate the
models’ coefficients using data. In other words, the theory used for the modeling
limits the scope of social universe the social telescope observes, so the modeling
might miss the possibility of finding a new theory that exists outside of the scope. In
contrast, as abovementioned, machine learning is free from the deductive approach,
so it can search social universe for models that have more predictive power.
One caveat should be mentioned, however. Machine learning does not tell us how
the selected model should be interpreted. This is different from conventional statis-
tical models. For example, we understand how a regression model estimates its
coefficients using data. We can easily understand what the model means based on the
interpretation of coefficients. This is generally not the case when we use machine
learning model. The model is so complex that it is extremely difficult to grasp the
gist of the model. In other words, the selected model is in a black box.
To solve this problem, a method called scientific regret minimization method is
proposed. This method compares a black-box model obtained by machine learning
and a model that is interpretable such as a regression model. If the gap between them
is large, the latter is improved to make the gap smaller. And, finally, we find an
interpretable model whose gap with the black-box model is the smallest. At the end
of this process, we get an interpretable model with strong predictive power.

8.3 A New Way of Using Computational Linguistic Models


in Sociology

Chapter 5 explores the possibility of applying topic models and word-embedding


models to sociological study of meaning-making. As pointed out many times in this
book, meaning and interpretation have been one of the central themes in sociology.
Thus, if we successfully apply the models in computational social science to the
study of meaning-making, we can show that computational social science is not just
a tool for sociological analysis of digital data, but it substantively contributes to the
advancement of sociology.
In preparation for that, the chapter raises three points of meaning-making:
Prediction, relationship, and semantic learning. Prediction is an important function
of meaning. We interpret the meaning of things or events such as a non-smoking sign
and a physical movement of an actor. We interpret the non-smoking sign as the
prohibition of smoking in the room, so we predict that nobody in the room smokes. If
we see a stranger with a knife is approaching us, we interpret the behavior as
120 Y. Sato and H. Takikawa

attacking us and predict that he/she will stab us. Based on such predictions, we can
take a proper action such as not smoking in the room and running away from the
stranger.
Relationship is the foundation of prediction. Meaning relates a thing and an event
to other things and events, which is a basic characteristic of prediction. We relate a
non-smoking sign to predictions that nobody smokes in the room and that we would
be punished if we smoked. We relate a stranger approaching us with a knife to a
prediction that he/she will stab us. In other words, a thing or an event does not have
meaning if it is detached from other things or events. Meaning of a thing or an event
exists in the network of the thing or the event with other things and events.
Semantic learning is a process that we correct the estimation of the relationship.
To cite an example in Chap. 5, suppose that we see the color of a mushroom, think
that the mushroom is edible, and eat it. If the mushroom is poisonous and we have a
stomachache, we learn that the relationship between the color of the mushroom and
its edibility is wrong and update the relationship with the fact that it was poisonous.
Do topic models and word-embedding models capture these points so that they
would analyze meaning-making from a different viewpoint from that of conven-
tional sociology? Chapter 5 gives a positive answer to this question.
Topic models extract latent topics from observed sentences and relate
(1) sentences and topics and (2) words in the sentences and topics with probabilities.
These characteristics of topic models uncover meaning of a word. As pointed out
above, a word does not have a meaning by itself. It has a meaning only if it is related
to other words. To cite an example in Chap. 5, a “tie” means a draw if it is strongly
related to a topic “sports” and appears with another word “game” also being related
to the topic “sports.” In contrast, a “tie” means a necktie if it is related to a topic
“fashion” and appears with another word “suits” also being related to the topic
“fashion.” Thus, topic models unveil relationality of meaning of a word and poly-
semy of a word. In the above example, a “tie” has two meanings depending on which
words are used with it. Therefore, topic models are suitable for capturing polysemy.
Topic models use Bayesian updating methods to estimate their parameters. This is
similar to the abovementioned semantic learning. Bayesian updating methods update
parameters, which are called posterior parameters, using prior parameters and new
information. In the case of the abovementioned mushroom example, we had a prior
belief (parameter) about the relationship between the color of the mushroom and its
edibility. Then we got new information that the mushroom was poisonous, and that
we had a stomachache. Using this information, we revise the prior belief and get a
posterior belief (parameter). Of course, we do not rigorously use Bayesian updating
methods in everyday life, but we conduct a kind of updating process of parameters.
It is obvious that the logic of topic models and people’s meaning-making are
different from each other, but we think that we can get useful tips to understand
people’s meaning-making by deeply understanding the logic of topic models. This
way of using topic models is completely different from their conventional use, but
this is a way to apply topic models to meaning and interpretation, major research
topics in sociology.
8 Coda 121

Word-embedding models create semantic space, which helps us to understand


similarity between meanings of words as well as relationships between them. In the
models the meaning of a word is represented as a vector. And like vector calculation
in n-dimensional real space, we can conduct calculation of meanings in semantic
space. To cite an example in Chap. 5, king - man + woman becomes queen.
Word-embedding models using neural networks have the predictive function of
meaning. This is because a word is predicted by words surrounding it. The models
also capture the relationality of meanings because of the same reason. The prediction
of a word from surrounding words implies the relationality of meanings. As for
learning, the models use neural networks, which learn via back propagation.
Thus, topic models and word-embedding models capture the predictive function
of meanings, the relationality of meanings, and learning processes and, therefore,
could help us to understand the meaning-making process of people. However, one
more problem remains: explaining human actions. Since Max Weber’s interpretive
sociology, meaning and interpretation have been key concepts to explain actions.
Suppose that a shaman conducts a ritual for rain with his/her villagers. The ritual is
not understandable from the viewpoint of modern science. However, if we under-
stand that the shaman and villagers believe that the ritual works well, we properly
interpret the meaning of the ritual and explain why they conduct it.
How can we explain actions with the help of computational linguistic models?
Chapter 5 proposes that a regression model with word vectors as independent
variables and people’s response to the words as the dependent variable clarifies
how people interpret the meaning of the words. Citing an example in Chap. 5,
Ueshima and Takikawa (2021) predicted how people judge priority of vaccination of
COVID-19 using a regression model with word vectors as independent variables.
Word vectors for occupations were obtained by a word2vec model, a word-
embedding model. The model shows how people interpret the meaning of occupa-
tions and how people judge which occupations have priority on vaccination.

8.4 What Is the Next Step?

Chapters in the book including Chaps. 3 and 5 have shown that computational social
science has the potential to advance sociological research. We have also emphasized
the importance of meaning and interpretation for computational social science to
substantively contribute to the advancement of sociology and shown examples of
such practices.
Sociology should also change itself to promote collaboration with computational
social science. It has evolved by the combination of sociological theories and
empirical studies checking their empirical validity. Conventionally, most of the
empirical studies are case studies and statistical analysis of survey data. Both of
them are strong tools for the development of new sociological theories. Take study
of inequality of educational opportunity, for example. Reduction in the inequality
has been an important topic in society. This is because modern societies assume that
122 Y. Sato and H. Takikawa

equality of educational opportunity is an ideal of them. The ideal tells that, in modern
society, anybody should equally get access to education, higher education in partic-
ular, no matter what family he/she comes from. Therefore, it has been a central topic
in the study of social inequality to empirically clarify the degree of the inequality.
A study of inequality of educational opportunity in Ireland by Raftery and Hout
(1993) is in line with this research tradition. They conducted statistical analysis of
data on transition from elementary education to secondary education and from
secondary education to higher education. The Irish government conducted reforms
of secondary education in 1967. For example, tuition fees became free, and free
school transportation was provided. These reforms resulted in the increase in the
overall participation rate in secondary education. However, it is another story
whether class differentials of the rate were reduced. Intuitively, it is plausible that
the reforms increased the opportunity for children from lower classes to enter
secondary education, and, therefore, the class differentials decreased. However,
this intuitive story should be empirically examined.
For this examination Raftery and Hout (1993) analyzed effects of the reforms on
inequality of educational opportunity by birth cohorts and social classes. To sum-
marize their findings, the reforms did not reduce the inequality. As for the opportu-
nity to enter secondary education, the class differentials reduced for young cohort.
This is an effect of the reform. However, as for the opportunity to complete
secondary education and to enter higher education, the differentials did not change.
Raftery and Hout (1993) generalized these findings to propose a hypothesis of
maximally maintained inequality, which has been cited uncountably. The hypothesis
is as follows (Hout, 2006; Raftery & Hout, 1993):
1. Even if educational expansion occurs by the educational reform, class differen-
tials of educational opportunity do not change. It is the case that the transition
rates of students from lower classes increase, but the rates of students from any
classes increase in parallel.
2. Then, if the demand for better education of students from upper classes is
saturated, more educational opportunities become open to students from lower
and middle classes. This results in reducing class differentials.
3. Conversely, if educational opportunities shrink, inequality of educational oppor-
tunity increases.
After proposing the hypothesis, Raftery and Hout (1993) proposed a rational
choice model to explain it. To cite it simply, students and their families decide
whether to move up to higher education or not by calculating costs and benefits
associated with the education. Upper-class students and their families estimate that
benefits are larger than costs, so they decide to continue education. In contrast their
lower-class counterparts estimate them in the opposite way, so they tend not to
continue education. Only if educational opportunity expands and the transition rate
of upper-class students saturates, lower-class students and their family begin to
estimate that the benefits exceed the costs and decide to continue education. This
results in the decrease in class differentials.
8 Coda 123

Raftery and Hout’s (1993) study of inequality of educational opportunity in


Ireland is a wonderful example of empirical research in sociology. They reported
persistent inequality of educational opportunity even after the reform by analyzing
statistical data, generalized their findings to propose the hypothesis, and propose a
model or theory to explain it. We argue that sociological research using computa-
tional social science should also follow their step. Computational social science
techniques often reveal interesting hidden pattens that would not be found by
conventional methods in sociology. This is an advantageous point of computational
social science. However, if they are not connected to sociological theories, the
patterns do not substantively contribute to the advancement of sociology.
There are two ways to connect sociology and computational social science. The
first way is to begin with a sociological theory following the deductive approach.
Then computational social science techniques and data suitable to the empirical test
of the theory are selected and used for the test. This way is fruitful if conventional
sociological methods and data cannot check the empirical validity of the theory
because of their limitations. A problem of this way is, as argued in Chap. 3, that it is
difficult to create a new theory. Sociologists following this way tend to check
empirical validity of an existing theory rather than to forge a new theory. Of course,
excellent sociologists think of a new theory and check its empirical validity with
conventional sociological methods or computational social science technique. How-
ever, this is not always the case.
The second way is to follow the step of Raftery and Hout (1993). That is, finding
an interesting pattern using data analysis, generalizing it to a hypothesis, and forging
a theory to explain the hypothesis. Computational social science is advantageous
compared with conventional sociological methods because the former is more likely
to find a pattern that was not expected than the latter is. This may be because
machine learning is not affected by the cognitive framework of a sociologist using
it. For example, when they conduct regression analysis, sociologists tend to choose
independent variables based on their prior sociological knowledge. In contrast,
machine learning does not use such knowledge, so it can choose independent vari-
ables that were not expected but seem to be plausible. Then sociologists need to
forge a theory to explain why the variables were chosen. The relationship between
the dependent variable and the independent variable is in a black box. It is not
computational social science techniques but sociologists with sociological knowl-
edge and imagination who can create a new theory that explains a newly found
pattern and contribute to the advancement of sociology.
In conclusion, computational social science has a great potential with which
sociologists conduct research at higher levels. However, they should begin it with
a sociologically important idea. The idea could be central concepts and theories such
as meaning and interpretation. Or it could be effects of educational reforms on
inequality of educational opportunity in Ireland as reported by Raftery and Hout
(1993). The power of computational social science enables sociologists to conduct
research that could not be done by conventional sociological methods. Proper use of
computational social science opens a new door to upgrading sociology.
124 Y. Sato and H. Takikawa

References

Bail, C. A. (2016). Cultural carrying capacity: Organ donation advocacy, discursive framing, and
social media engagement. Social Science & Medicine, 165, 280–288.
Boero, R., Castellani, M., & Squazzoni, F. (2004a). Cognitive identity and social reflexivity of the
industrial district firms: Going beyond the ‘complexity effect’ with agent-based simulations. In
G. Lindemann, D. Moldt, & M. Paolucci (Eds.), Regulated agent-based social systems
(pp. 48–69). Springer.
Boero, R., Castellani, M., & Squazzoni, F. (2004b). Micro behavioural attitudes and macro
technological adaptation in industrial districts: An agent-based prototype. Journal of Artificial
Societies and Social Simulation, 7(2), 1.
Boero, R., Castellani, M., & Squazzoni, F. (2008). Individual behavior and macro social properties:
An agent-based model. Computational and Mathematical Organization Theory, 14, 156–174.
Centola, D. (2018). How behavior spreads: The science of complex contagions. Princeton Univer-
sity Press.
Centola, D., & Macy, M. W. (2007). Complex contagions and the weakness of long ties. American
Journal of Sociology, 113(3), 702–734.
Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2021).
Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589,
82–87.
DiMaggio, P., Nag, M., & Blei, D. (2013). Exploiting affinities between topic modeling and the
sociological perspective on culture: Application to newspaper coverage of U.S. government arts
funding. Poetics, 41, 570–606.
Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the
emergence of cultural variation. American Sociological Review, 83(5), 897–932.
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online
social research. Annual Review of Sociology, 40, 129–152.
Hout, M. (2006). Maximally maintained inequality and essentially maintained inequality:
Crossnational comparisons. Sociological Theory and Methods, 21(2), 237–252.
Popper, K. R. (1959). The logic of scientific discovery. Hutchinson.
Raftery, A. E., & Hout, M. (1993). Maximally maintained inequality: Expansion, reform, and
opportunity in Irish education, 1921–75. Sociology of Education, 66(1), 41–62.
Sato, Y. (2017). Does agent-based modeling flourish in sociology? Mind the gap between social
theory and agent-based models. In K. Endo, S. Kurihara, T. Kamihigashi, & F. Toriumi (Eds.),
Reconstruction of the Public Sphere in the Socially Mediated Age (pp. 37–46). Springer Nature
Singapore Pte.
Ueshima, A., & Takikawa, H. (2021). Analyzing vaccination priority judgment for 132 occupations
using word vector models. In WI-IAT ‘21: IEEE/WIC/ACM International conference on web
intelligence and intelligent agent technology (pp. 76–82).

You might also like