The behavioral change among treated units after receiving a binary treatment, also known as ATT, is expressed as follow:

\[\begin{align} \tau_{ATT} = \mathbb{E}[Y(1) - Y(0) | D = 1] \end{align}\]

Supposed an experimenter observes a binary random variable, $T$, which represents the timing of treatment assignment.

Furthermore, supposed that the experimenter knows that the treatment is assigned if only if $T = 1$ is observed, so that $P(D = 1, T = 1) = 1$ and $P(D = 1, T = 0) = 0$.

When $T$ is observable, ATT can be represented as follow:

\[\begin{align} \tau_{ATT} = \mathbb{E}[Y(1) - Y(0) | D = 1, T = 1] \end{align}\]

The fundamental problem in causal inference states the quantity in the second term above is counterfactual, because the world where the treated units do not receive the treatment cannot be observed.

Difference-in-Difference research design solves this identification problem by introducing two assumptions. These are (i) parallel trends and (ii) no anticipation, which, respectively, are formally expressed as follow:

  • Parallel Trends:

$\mathbb{E}[Y(0)|D = 1, T = 1] - \mathbb{E}[Y(0)|D = 1, T = 0] = \mathbb{E}[Y(0)|D = 0, T = 1] - \mathbb{E}[Y(0)|D = 0, T = 0]$

  • No Anticipation:

$\mathbb{E}[Y(1)|D = 1, T = 0] = \mathbb{E}[Y(0)|D = 1, T = 0]$

The parallel trend assumption simply means that, in the absence of treatment assignment, the treated and control groups will behave similarly over time. No anticipation assumption implies that the treatment variable does not affect the behavior of treated group before the period of assignment, that is, $T = 0$.

Using these two assumptions, ATT can be expressed as a function observable quantities as follow:

\[\begin{align*} \tau_{ATT} = \underset{\text{difference of treated group}}{\{\mathbb{E}[Y\|D = 1, T = 1] - \mathbb{E}[Y|D = 1, T = 0]}\} - \underset{\text{difference of control group}}{\{\mathbb{E}[Y|D = 0, T = 0] - \mathbb{E}[Y|D = 0, T = 0]\}} \end{align*}\]

Hence, this is how we get the name for this identification strategy: difference-in-difference (DiD).

A recent survey literature by Torreblanca, et al. (2025) observes that DiD is the most popular research design in political science. In a recent working paper, Baker et al. (2025, 9) posit a plausible explanation on why DiD so attractive. They say, “It is intuitive, it has very mild data requirements (just four means), it answers ex post questions like “what did the treatment do?”, and its identifying assumption can be stated precisely.”

Indeed, what is so interesting about this research design is that we no longer need a statistical independence between $Y$ and $D$ to eliminate the selection bias when an experimenter has an access for $T$ or a panel data and, of course, the two identification assumptions plausibly hold.

In what follows, we discuss a simple proof for this identification strategy.

Proof

The first part of the proof is to demonstrate that ATT can be represented as follow when $T$ is observable:

\[\begin{align} \tau_{ATT} = \mathbb{E}[Y(1) - Y(0) | D = 1, T = 1] \end{align}\]

Start from the definition of ATT and apply the law of iterated expectation by using $T$:

\[\begin{align*} \mathbb{E}[Y(1) - Y(0) | D = 1] &= \mathbb{E}[\mathbb{E}[Y(1) - Y(0) | D = 1, T]| D = 1] \\ &= \sum_{t= \{0, 1\}} \mathbb{E}[Y(1) - Y(0) | D = 1, T = t] P(D = 1, T = t) \\ &= \mathbb{E}[Y(1) - Y(0) | D = 1, T = 1] \end{align*}\]

The last equality is established from the assumption that the experimenter in a priori knows the timing of the treatment assignment.

The second part of the proof to represent the last equality above as a difference-in-difference equation.

First, distribute the expectation operator in the last equality as follow:

\[\begin{align*} \mathbb{E}[Y(1) - Y(0) | D = 1, T = 1] &= \mathbb{E}[Y(1)|D = 1, T = 1] - \mathbb{E}[Y(0) | D = 1, T = 1] \\ &= \mathbb{E}[Y|D = 1, T = 1] - \mathbb{E}[Y(0) | D = 1, T = 1] \end{align*}\]

The first term is established from the definition of potential outcome framework, $Y = Y(0) + D[Y(1) - Y(0)]$, whereas the second term is the counterfactual.

Secondly, we express the counterfactual quantity by using parallel trend assumption, as shown below:

\[\begin{align*} \mathbb{E}[Y(0)|D = 1, T = 1] = \mathbb{E}[Y(0)|D = 0, T = 1] - \mathbb{E}[Y(0)|D = 0, T = 0] + \mathbb{E}[Y(0)|D = 1, T = 0] \end{align*}\]

Returning the last equality above to $\tau_{ATT}$ and arranging the term, we have the following equation:

\[\begin{align*} \mathbb{E}[Y|D = 1, T = 1] - \mathbb{E}[Y(0) | D = 1, T = 1] &= \{\mathbb{E}[Y|D = 1, T = 1] - \mathbb{E}[Y(0)|D = 1, T = 0]\} \\ &- \{\mathbb{E}[Y(0)|D = 0, T = 1] - \mathbb{E}[Y(0)|D = 0, T = 0]\} \end{align*}\]

Finally, apply no anticipation assumption in the second term above, we get the result.