# causality.estimation
This module is for causal effect estimation! When you run a randomized controlled experiment (e.g. an A/B test), you know that people in the test group are, on average, similar to people in the control group. For any given covariate, Z, you expect that the average of Z in each group is the same.
When you only have observational data, you can't be sure that the group assignments are independent of other covariates. The worst case scenario is that the effect of the treatment is different between the test and the control group. Then, the treatment's effect on the test group no longer represents the average effect of the treatment over everyone.
In a drug trial, for example, people might take the drug if they've taken it in the past and know it works, and might not take it if they've taken it before and found that it doesn't work. Then, you'll find that the drug is much more effective for people who normally take it (your observational test group) than people who don't normally take it. If you enacted a policy where everyone who gets sick gets the drug, then you'll find it much less effective on average than it would have appeared from your observational data: your controlled intervention not gives the treatment to people it has no effect on!
Our goal, then, is to take observational data and be able to answer questions about controlled interventions. There are some excellent books on the subject if you're interested in all of the details of how these methods work, but this package's documentation will give high-level explanations with a focus on application. Some excellent references for more depth are Morgan and Winship's [_Counterfactuals and Causal Inference_](https://www.amazon.com/Counterfactuals-Causal-Inference-Principles-Analytical/dp/1107694167), Hernan's [_Causal Inference_](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/), Pearl's groundbreaking (but extremely difficult, and not application-focused) [_Causality_] (https://www.amazon.com/Causality-Reasoning-Inference-Judea-Pearl/dp/052189560X), or Imbens and Rubin's [_Causal Inference_](https://www.amazon.com/Causal-Inference-Statistics-Biomedical-Sciences/dp/0521885884/ref=sr_1_1?s=books&ie=UTF8&qid=1496343137&sr=1-1&keywords=imbens+and+rubin).
There are some critical caveats to all of these approaches. First, if you don't know what variables to control for, you're often out of luck. This is true of all methods that rely on controlling. Other methods, like Instrumental Variables, or mechanism-based methods, get around this by instead making certain assumptions about the structure of the system you're studying. We'll make a note of which type of algorithm you're dealing with in the tutorial for that algorithm, but it should be relatively clear from the context. This distinction is a little artificial, since you can often do controlling alongside approaches that rely on structural assumptions.
## Sub-modules:
### parametric
Most of the classic models you'd like to use are probably in this portion of the package. Currently, these include propensity score matching and difference-in-differences.
#### PropensityScoreMatching
Propensity score matching tries to attack the problem of dissimilar test and control groups directly. You have the option of making the test group more similar to the control group, or vice versa. When we're talking about similarity, we mean similar by some metric. In the case of propensity score matching, that metric is the "propensity score". The propensity score is the probability a unit is assigned to the treatment given a set of covariates, $$P(D|Z_1, Z_2, ..., Z_n)$$. We can use a specific example to make all of this concrete. We'll run through the example for a high-level explanation, and then go in-depth into the assumptions and caveats.
##### High-level Example
Suppose we're in the publishing business, and we're interested in the effect of "the length of an article title" on "the click-through rate of the article" (the proportion of times when a link to an article is seen and also clicked). To make things really simple, we'll just consider "long" titles and "short" titles. We're interested in how much better a long title clicks than a short title.
There's a big problem: we can't force our writers to make their titles a certain length. Even worse, we think that our better writers tend to write longer titles. Since they're better writers, their titles also tend to click better _independently from the effects of the length of the title on click-through rates_. This results in a correlation between title length and click-through rates, even if there is no causal effect! They are both caused by the author.
In order to handle this, we can try to control for the effect of the author. There's a direct way to do this, by looking at the effect of title length on click-through rates for each author, and then averaging over authors. That way, each effect measurement controls for author, and you average the effect measurements together to get the total result. This easy to do when we only care about one variable, but usually we want to control for a lot more. Consider that the vertical (e.g. news, entertainment, etc.) the author writes for might also confound the effect (e.g. news headlines might be longer, but also more interesting and so clickier). The more variables there are to control for, the harder it is to find data for every possible combination of values. This is where propensity score matching really shines: if you're willing to assume a model for the propensity scores, then you can do this kind of controlling. In this package, we build in a logistic regression model. In general, you can use any model you like.
In order to use this package, the simplest implementation assumes you have all of the relevant data in a pandas.DataFrame object, `X`. We'll have author names as strings in `X['author']`, title length as `0` for short, and `1` for long in `X['title_length']`, vertical in `X['vertical']`, and the outcome we're interested in, the click-through rate (CTR) in `X['ctr']`.
Estimating the effect is as simple as
```python
from causality.estimation.parametric import PropensityScoreMatching
matcher = PropensityScoreMatching()
matcher.estimate_ATE(X, 'title_length', 'ctr', {'author': 'u', 'vertical': 'u'})
```
The first argument contains your data, the second is the name of the dataframe column with the "cause" (must be binary for PSM, but there's a little flexibility on how you encode it. Check the docstring for details.), the 3rd argument is the name of the outcome. The 4th argument is a dictionary that tells the algorithm what you'd like to control for. It needs to know whether your data is discrete or continuous, so the values of the dictionary are `'c'` for continuous, `'o'` for ordered and discrete, and `'u'` for unordered and discrete.
The name `ATE` stands for "average treatment effect". It means the average benefit of the `1` state over the `0` state.
Now, we'll do a more in-depth example which will involve examining whether a few assumptions we make with PSM are satisfied, and we'll see how to get confidence intervals.
##### Detailed Example
Propensity score matching does a lot of work internally. It attempts to find treatment and control units who are similar to each other, so any differences in them can be attributed to the difference treatment assignments. We're making a few assumptions here. The most critical is probably that we've controlled for all of the variables that say whether two units are "similar enough" to be matched together. There is a very technical criterion called the ["back-door criterion"](http://bayes.cs.ucla.edu/BOOK-2K/ch3-3.pdf) (BDC) that answers this question. It's impossible to check without doing an experiment. This is a common problem with using observational data. For this reason, most methods are really just "best guesses" of the true results. Generally, you hope