Categorical Data Analysis:
Models for Categorical, Ordinal, and Count Outcomes
ICPSR Summer Program
July 17 – August 4, 2023
Instructor: Dr. Olga Chyzh and Dr. Mark Nieman
Time and Location: TBA
Contact: TBA
Office Hours: TBA
Overview and Objectives
This class introduces a variety of statistical techniques for limited and categorical dependent
variables relevant to political science research. The objective is for you to become familiar
with these techniques to understand how, when, and why to use them. We therefore empha-
size empirical applications, and a large portion of class time (approximately 40% of classes)
is devoted to hands-on use and interpretation of these methods on computers. We recom-
mend using the statistical programs R, since all sample code for assignments are written
in this programming language. We assume some familiarity with the concepts of maximum
likelihood, linear algebra, calculus, and probability theory, but we may review these topics
as necessary.
The course covers a wide variety of estimators, including those for binary, ordered, polychoto-
mous, and multivariate outcome variables. We also discuss issues relating to truncation,
censoring, and non-random sample selection. Additional and related topics will be covered
as necessary. The main tools through which you will familiarize yourself with these methods
are Maximum Likelihood Estimation and Monte Carlo analysis, which will be presented in
the first and second weeks. Many of your homework assignments and in-class work will
involve these two techniques.
Requirements
Grades are based on three homework assignments.
The best way to learn the material is to use the estimators. We will assign homework on
a weekly to bi-weekly basis. Many assignments will specify a model and ask you to run a
Monte Carlo analysis that involves generating data and then estimating parameters using a
few different assumptions (both correct and incorrect). When you turn in the homework, we
want you to upload an electronic copy of your R script file, appropriate graphical or tabular
1
representation of the results, and a document summarizing your results in words, to Canvas.
The file should be written such that we am able to run it and replicate your results without
modification.
Required Texts
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables.
USA: Sage Publications.
Recommended:
King, Gary. 1989. Unifying Political Methodology: The Likelihood Theory of Statistical
Inference. Ann Arbor, MI: University of Michigan Press.
Course Outline
Week 1
• Monday—No class.
• Tuesday—Introduction and Overview of Maximum Likelihood Estimation
– Long, Ch 1–2.
• Wednesday—Linear Probability Model, transformational and latent variable approach
for binary outcomes
– Long, Ch 3.
• Thursday—Estimation of BRM; Odds ratios; Interpretation
• Friday—Model Fit
– Long, Ch 4.
Week 2
• Monday—Ordinal variables; a latent variable model
– Long, Ch 5.
• Tuesday–Estimation of ORM; Parallel Regression Assumption; Interpretation
• Wednesday– Multinomial logit as a set of BLMs; IIA; Interpretation
– Long, Ch 6.
• Thursday–Counts; Poisson process; estimation of PRM; assessing fit
2
– Long, Ch 8.
• Friday–Negative Binomial Model; Estimation and Interpretation
Week 3
• Monday–Heteroskedastic Probit
– Arena, Philip and Glenn Palmer. 2009. Politics or the Economy? Domestic Cor-
relates of Dispute Involvement in Developed Democracies. International Studies
Quarterly 53(4): 955-975.
• Tuesday–Multivariate Probit
– Carrubba, Cliff and Richard J. Timpone. 2005 Explaining Vote Switching Across
First- and Second-Order Elections: Evidence from Europe. Comparative Political
Studies 38(3): 260-281.
• Wednesday–Censoring and Truncation
– Long, Ch 7, Sections 1–3.
• Thursday–Sample Selection
– Long, Ch 7, Section 4.
• Friday–Split-Sample and Strategic Models
– Signorino, Curtis S. 2003. Structure and uncertainty in discrete choice models.
Political Analysis 11(4: 316-344.