Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
Statistical Approaches To Causal Analysis, 1st Edition EPUB DOCX PDF Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/statistical-approaches-to-causal-analysis-1st-editio
n/
ISBN 978-1-5264-2473-0
At SAGE, we take sustainability seriously. Most of our products are printed in the UK using responsibly sourced
papers and boards. When we print overseas, we ensure sustainable papers are used as measured by the PREPS
grading system. We undertake an annual audit to monitor our sustainability.
Dedication
This book is dedicated to my mother, Mary Kay McBee, who passed away before she
could see this book in print.
Contents
1 Introduction 1
Internal Validity 2
External Validity 2
Threats to Validity 3
Randomisation6
Non-Experimental Research 7
A Pragmatic Definition of Causation 9
Prediction Versus Explanation 10
Causal Inference Requires External Information 11
Estimation Versus Hypothesis Testing 13
Prerequisites13
Notation14
The R statistical programming environment 14
Installing and Using R and RStudio 16
R Packages 16
Structure of This Book 17
2 Conditioning 19
Sample Selection 30
The Bias–Variance Trade-Off 32
Subclassification33
Matching35
Weighting39
Computing the Weights 39
The Problem of Measurement 42
Classical Test Theory Model for Measurement Error 43
Reliability 44
Discussion46
The ‘Curse of Dimensionality’ 47
8 Conclusion 205
Glossary 213
References 217
Index 229
List of Figures and Tables
List of figures
1.1 RStudio 16
3.10 Bias from conditioning on a collider. The regression line fitted to only
the college attendees (solid line) has a negative slope. The regression line
fitted to all of the data (dashed line) has the correct zero slope 65
3.11 DAG containing a spurious path 65
3.12 Effect of conditioning. Left panel: Conditioning on A. Centre panel:
Conditioning on B. Right panel: Conditioning on Z. Any of these
conditioning decisions block the spurious connection between X and Y
and would permit the unbiased estimation of their non-existent causal
relationship66
3.13 A DAG containing an unobservable (U) and an instrument (A).
The causal effect of X on Y cannot be identified via conditioning 69
3.14 DAG containing direct and indirect effects 70
3.15 Effect of conditioning on mediators. Left panel: Conditioning on A.
Centre panel: Conditioning on B. Right panel: Conditioning on A
and B71
3.16 An illustration of the front-door criterion. U is an unobservable
confounder73
3.17 Simultaneous estimation of causal effects and potential DAGs.
Panel 1: A and X as competing exposures. This is the only condition
in which simultaneous estimation should be used. Panel 2: A as
confounder of X → Y, X as mediator of A → Y. Panel 3: A as mediator of
X → Y, X as confounder of A → Y. Panel 4: A as mediator of X → Y. X and
U as confounders of A → Y. The causal effect A → Y is not estimable after
conditioning on X due to U75
3.18 A DAG with measurement error. X is the exposure variable and Y is
the response. Ztrue is the unobservable confounder true score. Zobs is
the observed version of the confounder, which is contaminated with
measurement error. Conditioning on Zobs cannot remove all the Ztrue’s
confounding77
3.19 dagitty.net is a tool for drawing and analysing DAGs 78
3.20 dagitty.net analysis of the DAG after conditioning on the collision
node Z. Two sufficient adjustment sets are identified 78
3.21 Equivalent DAGs. These graphs generate the same implied conditional
independencies. They cannot be distinguished on the basis of data 80
4.1 Combinations of the values of X1 and X2 that produce a propensity
score of p = .8 100
4.2 Equivalence in distribution on X variables after conditioning on the
propensity score 100
xvi STATISTICAL APPROACHES TO CAUSAL ANALYSIS
4.3 Example of a DAG for the propensity score analysis examples 101
4.4 Logit link mapping values of the linear predictor to p103
4.5 Logistic regression–derived propensity score density by treatment
status from the analysis data 105
4.6 Example of propensity score distribution with no common support.
Causal effects cannot be estimated with propensity score methods
under this condition 106
4.7 Example of propensity score distribution with moderate overlap.
Left panel: Distribution of propensity scores by group. Right panel:
Distribution of propensity scores after discarding cases 107
4.8 Propensity score dimension reduction. Black circles are treatment cases.
White triangles are control cases. Panel 1: Three-dimensional scatter plot
of confounders A (x-axis) and B (y-axis) versus treatment status (D; z-axis +
shape and colour) viewed from above. Panel 2: Rotation to show the points
plotted against the predicted probability (propensity score) surface. Panel 3:
Further rotation to display the sigmoid shape of the surface with vertical
reference lines connecting each point to its estimated propensity score.
Panel 4: Location of each point with respect to the propensity score (z-axis).
The formerly two-dimensional location of each point has been reduced
to its location along a one-dimensional number line 109
4.9 The two-dimensional space of variables A and B is reduced to a
one-dimensional location along a number line of propensity scores 109
4.10 Example of classification tree. The numbers below each leaf are the
predicted probabilities of treatment 110
4.11 A classification tree partitions the data into areas of maximum
class similarity 111
4.12 Boosted model performance in-sample (bottom line) and out-of-sample
(top line) by ensemble size. The dashed vertical reference line indicates
maximum out-of-sample classification performance 113
4.13 Boosted model–estimated relationships between model variables
and the (logit) probability of receiving treatment 114
4.14 Boosted classification tree–derived propensity score density by
treatment status from the analysis data 115
4.15 Scatter plot of logistic propensity scores versus boosted
propensity scores 116
5.1 Example of a DAG for the propensity score analysis examples 120
5.2 Frequency distributions of variables A and B by treatment
condition (D)121