Propensity Score Theorem

In the previous discussion, we observe that the conditional expectation (also called as regression) of $Y$ on $D$ from a random sample recovers the average treatment effect on the treated (ATT) group and a bias factor from the population when $D$ is not statistically independent on $Y$.

However, this bias can be eliminated when we allow $Y$ and $D$ being conditional independent after controlling for random vector $\boldsymbol{X}$ in the regression. We call this assumption as conditional ignorability, which formally is expressed as follow:

Conditional Ignorability: $\mathbb{E}[Y|D, \boldsymbol{X}] = \mathbb{E}[Y| \boldsymbol{X}]$

Utilizing the random vector $\boldsymbol{X}$ further to recover ATT from the random sample, we can impose additional assumption, which is called overlapping, which allows us to construct a more powerful estimator. This assumption is formally expressed as follow:

Overlapping: $0 < p(\boldsymbol{X}) < 1$

where $p(\boldsymbol{X}) = P(D = 1 | \boldsymbol{X} = \boldsymbol{x})$ and $p$ being the true probability function of $D$ and $\boldsymbol{X}$.

We also will call $p$ as a propensity score, because this quantity represents the propensity of a unit in the population being assigned to the treatment variable conditional on the covariates $\boldsymbol{X}$.

The overlapping in addition to conditional ignorability assumptions give us an estimator for ATT with two properties:

(1) We can recover the conditional ATT on $\boldsymbol{X}$, write as $\tau_{ATT}(\boldsymbol{x})$, through the Inverse Probability Weighting (IPW) function as follow: $$ \tau_{ATT}(\boldsymbol{x}) = \mathbb{E}\bigg[\frac{(D - p(\boldsymbol{X})) Y}{P(\boldsymbol{X})(1 - p(\boldsymbol{X}))}\bigg| \boldsymbol{X} = \boldsymbol{x} \bigg] $$
(2) Using the propensity score, we can restore the conditional ATT through the variation of IPW as follow: $$ \tau_{ATT}(\boldsymbol{x}) = R(1) - R(0)) $$ where $$ \begin{align*} R(1) &= \mathbb{E}\bigg[\mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}] \\ &+ \frac{(Y(1) - \mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}]) D}{p(\boldsymbol{X})}\bigg] \\ \\ R(0) &= \mathbb{E}\bigg[\mathcal{M}[Y(0)|D = 1, \boldsymbol{X} = \boldsymbol{x}] \\ &+ \frac{(Y(0) - \mathcal{M}[Y(0)|D = 1, \boldsymbol{X} = \boldsymbol{x}]) (1 - D)}{1 - p(\boldsymbol{X})}\bigg] \end{align*} $$ and $\mathcal{M}$ is any choice of mean model for the regression of $Y$ on $D$ and $X$, where if $\mathcal{M}$ captures the true population relationship, then: $$ \mathbb{E}[\mathcal{M}[Y(d)|D = 1, \boldsymbol{X} = \boldsymbol{x}]] = \mathbb{E}[Y(d)|D = 1, \boldsymbol{X} = \boldsymbol{x}], \ \ \forall d \in \{0, 1\} $$ These functions give us the doubly robust estimator property: when the proposed conditional mean model $\mathcal{M}$ is misspecified, conditional ATT can still be recovered from the sample if $p(\boldsymbol{X})$ is correctly specified. On the other hands, conditional ATT can also be recovered when $p(\boldsymbol{X})$ is misspeficied, whereas $\mathcal{M}$ is correctly determined.

Proof of Property (1)

The first step of the proof involves the Law of Total Expectation Conditioning on $D$, we have the following expression:

\[\begin{align*} \tau_{ATT}(\boldsymbol{x}) &= \sum_{i = 0}^1 \mathbb{E}\bigg[\frac{(D - p(\boldsymbol{X})) Y}{p(\boldsymbol{X})(1 - p(\boldsymbol{X})}\bigg| \boldsymbol{X} = \boldsymbol{x} , D = i\bigg] P(D = i|\boldsymbol{X} = \boldsymbol{x}) \\ &= \mathbb{E}\bigg[\frac{Y}{p(\boldsymbol{X}) } \bigg| \boldsymbol{X} = \boldsymbol{x}, D = 1 \bigg] p(\boldsymbol{X}) - \mathbb{E}\bigg[\frac{Y}{1 - p(\boldsymbol{X}) } \bigg| \boldsymbol{X} = \boldsymbol{x}, D = 0 \bigg] (1 - p(\boldsymbol{X}) ) \end{align*}\]

The second step is to represent $Y = Y(1)D + (1 - D)Y(0)$ as in the potential outcome framework. Simplifying the arithmatic steps, the equality above then becomes as follow:

\[\begin{align*} \tau_{ATT}(\boldsymbol{x}) &=\mathbb{E}\bigg[\frac{Y(1)}{p(\boldsymbol{X}) } \bigg| \boldsymbol{X} = \boldsymbol{x}, D = 1 \bigg] p(\boldsymbol{X}) - \mathbb{E}\bigg[\frac{Y(0)}{1 - p(\boldsymbol{X}) } \bigg| \boldsymbol{X} = \boldsymbol{x}, D = 0 \bigg] (1 - p(\boldsymbol{X})) \end{align*}\]

Applying conditioning property of expectation operator, we then have the following equality:

\[\begin{align*} \tau_{ATT}(\boldsymbol{x}) &= \mathbb{E}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}] - \mathbb{E}[Y(0)|D = 0, \boldsymbol{X} = \boldsymbol{x}] \\ &= \mathbb{E}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}] - \mathbb{E}[Y(0)|D = 1, \boldsymbol{X} = \boldsymbol{x}] \end{align*}\]

The last equality is established by leveraging conditional ignorability assumption. Hence, we recover the $\tau_{ATT}(\boldsymbol{x})$ from the functional form above.

To get back the marginal ATT, we do the usual step by conditioning across sample level of $\boldsymbol{X}$ through the Law of Total Expectation, as shown previously in another note.

Obviously, the functional form in the property (1) exists if only if overlapping assumption holds. This assumption substantively means, across the sample level defined on the random vector $\boldsymbol{X}$, we can observe the unit observation that do not recieve the treatment variable.

Proof of Property (2)

To proof the second property, we simulate two situations.

First, supposed the true conditional mean model is $\mathcal{M}$, but we select the incorrect $\mu$, but the propensity score is correctly specified.

Let $\mu[Y(1)|D = 1, X = x] = \mu(1, \boldsymbol{X})$ to streamline the notation, working on the first term in the property (2), we have the following expression in the first term:

\[\begin{align*} R(1) &= \mathbb{E}\bigg[\mu(1, \boldsymbol{X}) + \frac{(Y(1) - \mu(1, \boldsymbol{X})) D}{p(\boldsymbol{X})}\bigg] \\ &= \mu(1, \boldsymbol{X}) + \mathbb{E}\bigg[\frac{(Y(1) - \mu(1, \boldsymbol{X})) D}{p(\boldsymbol{X})}\bigg] \end{align*}\]

Now we attack the problem from the second term in the last equality above. By linearity and Law of Total Expectation, the second term can be expressed as follow:

\[\begin{align*} \mathbb{E}\bigg[\frac{(Y(1) - \mu(1, \boldsymbol{X})) D}{p(\boldsymbol{X})}\bigg] &= \mathbb{E}\bigg[\frac{Y(1)D}{p(\boldsymbol{X})}\bigg] - \mathbb{E}\bigg[\frac{\mu(1, \boldsymbol{X})D}{p(\boldsymbol{X})}\bigg] \\ &= \mathbb{E}[Y(1)|D = 1, X = x] - \mathbb{E}[\mu(1, \boldsymbol{X})| D = 1, \boldsymbol{X} = \boldsymbol{x} ] \\ &= \mathbb{E}[Y(1)|D = 1, X = x] - \mu(1, \boldsymbol{X}) \end{align*}\]

The second equality above is taken from the proof on the property (1) above, whereas the third equality is taking from the conditioning property, which is also used in the proof on the property (1).

Taking all these terms together in the main equation, the incorrect mean model, $\mu$, is cancelled out, leaving us with the true conditional model, as shown below:

\[\begin{align*} R(1) &= \mu(1, \boldsymbol{X}) + \mathbb{E}[Y(1)|D = 1, X = x] - \mu(1, \boldsymbol{X}) \\ &= \mathbb{E}[Y(1)|D = 1, X = x] \end{align*}\]

The proof on the second term in the conditional ATT function, $R(0)$, follows the same procedures.

Now supposed that the propensity score model is incorrect. While the true model is $p$, we estimate $s$ instead. The conditional ATT can still be recovered even this is the case, as shown below.

Working on the first term, we have the following expression:

\[\begin{align*} R(1) &= \mathbb{E}\bigg[\mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x]} + \frac{(Y(1) - \mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}]) D}{s(\boldsymbol{X})}\bigg] \end{align*}\]

Obviously the first term in the equation above gives us the true model, hence we just need work on the second term.

Similarly, using only the Law of Total Expectation, we have the following expression:

\[\begin{align*} &= \mathbb{E}\bigg[ \frac{(Y(1) - \mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}])}{s(\boldsymbol{X})}\bigg| D = 1, \boldsymbol{X} = \boldsymbol{X}\bigg] s(\boldsymbol{X}) \\ &= \mathbb{E}\bigg[ (Y(1) - \mathcal{M}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x}\bigg] \\ &= 0 \end{align*}\]

Therefore,

\[\begin{align*} R(1) &= \mathbb{E}[Y(1)|D = 1, \boldsymbol{X} = \boldsymbol{x]} \end{align*}\]

Similar steps can be used to show the identity on $R(0)$.

Note: alternative approach to show identity in the property (1) can also be found in Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge, MA: MIT Press, p. 914