Let $(y_i, d_i)_{i \in 1 }^N$ be a pair of random samples. $Y$ is the outcome variable of interest on the real domain and $D$ is a binary treatment variable. Both have finite population moments

We are interested in identifying the Average Treatment Effect on Treated (ATT) units, which is informally defined as what-whould-be the average change in the outcome $Y$ if a unit in the population receives the treatment. This is formally expressed as:

\[\begin{align*} \tau_{ATT} = \mathbb{E}[Y(1)|D = 1] - \mathbb{E}[Y(0)|D = 1] \end{align*}\]

with $\mathbb{E}$ being an expectation operator, where $Y(1)$ and $Y(0)$, respectively, are observed outcome if a unit recieving the treatment and the observed outcome of a unit that does not receive the treatment.

We observe the quantity $\mathbb{E}[Y(1) |D = 1]$, but not $\mathbb{E}[Y(0) |D = 1]$.

From the previous discussion, we learn that taking the conditional means of the sample average will return $\tau_{ATT}$ plus the bias factor when $D$ is not randomly assigned on $Y$:

\[\begin{align*} \frac{1}{N_1} \sum_{i \in N_1} y_i - \frac{1}{N_0} \sum_{j \in N_0} y_j &\overset{p}\to \mathbb{E}[Y(1)|D = 1] - \mathbb{E}[Y(0)|D = 1] \ \text{by Law of Large Number} \\ &= \tau_{ATT} + \text{Bias} \end{align*}\]

with $N_1$ and $N_0$, respectively, are the observed sub-sample of units receiving the treatment and not receiving the treatment. The bias factor is defined as follow:

\[\begin{align*} \text{Bias} = \mathbb{E}[Y(0)| D = 1] - \mathbb{E}[Y(0) | D = 0] \end{align*}\]

So, unless $D$ is statistically independent of $Y(0)$, we cannot recover ATT from this sample.

Supposed we allow $D$ being independent of $Y(0)$ after conditioniong on the random vector $\boldsymbol{X}$, hence the following property holds:

  • Conditional Ignorability: $\mathbb{E}[Y(0) | \boldsymbol{X}, D] = \mathbb{E}[Y(0) | \boldsymbol{X}]$

Based on this property, we can eliminate the bias factor by estimating the conditional ATT on $\boldsymbol{X}$ from the sample

To show this, consider this is a random sample where the sample moment converges to the population moment due to the law of large number, allowing us to work on the population parameter. Taking this under consideration, this then yields:

\[\begin{align*} \mathbb{E}[Y(1) - Y(0)|D = 1, \boldsymbol{X} = \boldsymbol{x}] = \tau_{ATT}(\boldsymbol{x}) + \text{Bias}(\boldsymbol{x}) \end{align*}\]

Under conditional ignorability, the bias factor becomes as follow:

\[\begin{align*} \text{Bias}(\boldsymbol{x}) &= \mathbb{E}[Y(0)| D = 1, \boldsymbol{X} = \boldsymbol{x}] - \mathbb{E}[Y(0) | D = 0, \boldsymbol{X} = \boldsymbol{x}] \\ &= \mathbb{E}[Y(0)| \boldsymbol{X} = \boldsymbol{x}] - \mathbb{E}[Y(0) | \boldsymbol{X} = \boldsymbol{x}] \\ &= 0 \end{align*}\]

Even though the conditional ATT is identified from the observed sample, we need one more additional steps to identify the unconditional ATT.

Applying the Law of Total Expectation, the unconditional ATT can be recovered from the conditional version, as shown below:

\[\begin{align*} \mathbb{E}[Y(1) - Y(0)|D = 1] &= \mathbb{E}\bigg[\mathbb{E}[Y(1) - Y(0)|D = 1, \boldsymbol{X}] \bigg|D = 1\bigg] \end{align*}\]

Suppose $Z = Y(1) - Y(0)$ and $\boldsymbol{X}$ is one-dimensional random vector in the countable domain, then expanding the equality above, we have the following simplified computation problem:

\[\begin{align*} \mathbb{E}[Z|D = 1] &= \bigg(\sum_{x \in X} \bigg(\int_{\mathbb{R}} z \cdot \mathbb{P}(Z| D= 1 , X = x) \ dz\bigg) \cdot \mathbb{P}(X = x| D = 1) \bigg) \cdot \mathbb{P}(D = 1) \\ &= \bigg(\sum_{x \in X}\tau_{ATT}(\boldsymbol{x}) \cdot \mathbb{P}(X = x| D = 1) \bigg) \cdot \mathbb{P}(D = 1) \\ &= \sum_{x \in X}\tau_{ATT}(\boldsymbol{x}) \cdot \mathbb{P}(X = x, D = 1) \end{align*}\]

Hence, to recover the marginal ATT under conditional ignorability assumption, we need to specify the conditional expectation function $\tau_{ATT}(x$), the probability function $\mathbb{P}$, and then compute the last equality above.