Syamsul Bahri

Suppose the relationship of the outcome and explanatory variables is generated through the following function:

\[\begin{align*} Y &= X'\beta + \varepsilon \end{align*}\]

where $X = [1, X_1, \dots, X_j]$ is the $j \in \mathcal{J}$ column of variable $X$ and a constant, $\beta = [\beta_1, \dots, \beta_j]$ is the vector of unknown coefficients, and $\varepsilon$ is a mean-zero random variable.

Frisch-Waugh-Lovell (FWL) theorem holds that, regressing $Y$ on all variables $X_j$ but $i \neq j \in \mathcal{J}$, including a constant, as follow:

\[\begin{align*} Y &= X_j'\beta_j^Y + \varepsilon_Y \end{align*}\]

and regressing $X_i$ on all $X_j$, including a constant:

\[X_i = X_j'\beta_j^X + \varepsilon_X\]

then, the regression of $\varepsilon_Y$ on $\varepsilon_X$ will recover $\beta_i$ in the population with identical residuals:

\[\varepsilon_Y = \varepsilon_X' \beta_i + \varepsilon\]

FWL’s theorem arose initially as a method of seasonal trend adjustment. An applied example in Frisch and Waugh’s paper from 1933 is the estimation of demand elasticity when the time trend occurs. Their result suggests that controling the time variable directly in the regression function or adjusting them through the three steps regression above will produce the same result.

Beyond the time-series analysis, FWL’s theorem formally demonstrates what it means to interpret a regression coefficient when many controls are included. The regression between residuals in the last equation above suggests that a coefficient of interest in the regression with many controls is the effect of the focal on the outcome variable after partialling out the effects of controls on both variables.

This theorem also shows the implicit assumptions on the control variables in the linear regression. The auxilary steps in the FWL’s theorem exhibits that we implicitly assume both focal and outcome variables are generated in a linear fashion by these control. As will be shown in the proof below, we also explicitly assume the controls are exogenous to the focal and outcome variables, such that $\mathbb{E}[X_j’\varepsilon_X] = \mathbb{E}[X_j’\varepsilon_Y] = 0$.

Furthermore, FWL’s theorem has been used in the computation for the regression in the panel data. Commonly, empirical works introduce unit and time fixed effects when dealing unobserved time-invariant confounders. However, introducing dummy variables in a regression function demands a high computational power. On other hands, it is faster to de-mean the variables to the time and unit levels. To address these issues, statistical softwares standardize the outcome and explanatory variables to the unit and time-level. By FWL’s theorem, this approach gives the same results with improved faster computation.

The following note presents two proof of FWL’s theorem. The first is to follow Lovell’s “Simple Proof” and the second is a matrix algebra approach, following Hansen (2022, p. 75-81). The latter provides a generalized result on the FWL’s theorem.

Proof (1): Lovell’s Simple Proof

First, decompose the population regression as follows:

\[\begin{align*} Y = X_i'\beta_i + X_j'\beta_j + \varepsilon \end{align*}\]

where $X_i$ is a vector, $\beta_i$ is a scalar, and $X_j$ is a matrix of the remaining variables and a constant, multiplied by their vector of coefficient $\beta_j$.

Subtitute the population regression with the first two steps auxilary regressions above, we have the following equation:

\[\begin{align*} X_j'\beta_j^Y + \varepsilon_Y &= \beta_i \bigg[X_j'\beta_j^X + \varepsilon_X \bigg] + X_j'\beta_j + \varepsilon \\ \varepsilon_Y &= \beta_i \bigg[X_j'\beta_j^X + \varepsilon_X \bigg] + \beta_j X_j - X_j'\beta_j^Y + \varepsilon \\ &= \beta_i \varepsilon_X + \varepsilon + X_j'[\beta_j - \beta_j^Y + \beta_i\beta_j^X ] \\ &= \beta_i \varepsilon_X + \varepsilon + X_j'\alpha \end{align*}\]

with $\alpha$ is some constant. By the property of least square, we have $\mathbb{E}[\varepsilon_X’X_j] = 0$, so $\alpha = 0$. This gives the algebraic identity.

What is interesting in this proof is that when $X_i$ affects $X_j$, that is, the relationship between the focal and control variables are endogenous, the property $\mathbb{E}[\varepsilon_X’X_j] = 0$ does not holds, so $\alpha \neq 0$.

Proof (2): Matrix Algebra Approach

Hansen approaches FWL’s theorem as a nested optimization problem (p. 77-9). The first step of the proof is similar.

Starting from the decomposed regression:

\[\begin{align*} Y = X_i'\beta_i + X_j'\beta_j + \varepsilon \end{align*}\]

In the least square we want to choose the coefficients that minimizes the sum of square of $\varepsilon$. So the optimization problem is defined as follow:

\[\begin{align*} \underset{\beta_i, \beta_j}{\arg\min} \ \ \varepsilon' \varepsilon = \underset{\beta_i}{\arg\min} \bigg[\min_{\beta_j}(Y - X_i'\beta_i - X_j'\beta_j)'(Y - X_i'\beta_i - X_j'\beta_j) \bigg] \end{align*}\]

The nested notation means that we first minimize the function by assuming $\beta_i$ as a constant by choosing $\beta_j$ and then minimize the remaining function with respect to $\beta_i$.

Taking for granted that the sum of square errors are a convex function, the solution for $\beta_j$ is derived directly from the first order condition. Let $Y_i - X_i’\beta_i = \varepsilon_X$, then distribute the inner function:

\[\begin{align*} \varepsilon' \varepsilon &= (\varepsilon_X- X_j'\beta_j)'(\varepsilon_X - X_j'\beta_j) \\ &= \varepsilon_X' \varepsilon - \varepsilon_X' X_j'\beta_j - (X_j'\beta_j)' \varepsilon_X + (X_j'\beta_j)' (X_j'\beta_j) \end{align*}\]

The minimizer from the first order condition is then:

\[\begin{align*} \frac{\boldsymbol{\partial}\varepsilon' \varepsilon}{\boldsymbol{\partial} \beta_j } &= 0 \\ -2 X_j'\varepsilon_X + 2\beta_j X_j'X_j &= 0 \\ \beta_j &= (X_j'X_j)^{-1 }X_j'\varepsilon_X \end{align*}\]

Use this result to minimize the function with respect to $\beta_i$, we have the following equation:

\[\begin{align*} \min_{\beta_j}\varepsilon' \varepsilon &= (\varepsilon_X - X_j'(X_j'X_j)^{-1 }X_j'\varepsilon_X )' (\varepsilon_X - X_j'(X_j'X_j)^{-1 }X_j'\varepsilon_X ) \\ &= \varepsilon_X' (I - P )'(1 - P) \varepsilon_X \\ &= \varepsilon_X' M'M \varepsilon_X \end{align*}\]

We notice that $P = X_j(X_j’X_j)^{-1}X_j$ is a projection matrix $j$, whereas $I - P = M$ is its annilihator matrix. By the idempotent property of annilihator matrix, we have $M’M = M$.

Therefore, the last stage of minimization problem is below.

\[\begin{align*} \underset{\beta_i}{\arg\min} [\min_{\beta_j} \varepsilon' \varepsilon] &= \underset{\beta_i}{\arg\min} \ \ \varepsilon_X' M \varepsilon_X \\ &= \underset{\beta_i}{\arg\min} \ \ (Y - X_i' \beta_i)'M(Y - X_i'\beta_i) \\ &= \underset{\beta_i}{\arg\min} \ \ (Y - X_i' \beta_i)'(MY - MX_i'\beta_i) \end{align*}\]

Distribute:

\[\begin{align*} \varepsilon_X' M \varepsilon_X &= (Y'MY - Y' MX_i'\beta_i - \beta_i'X_i MY + \beta_i X_i M X_i'\beta_i) \\ &= (Y'MY - 2Y' MX_i'\beta_i + \beta_i X_i M X_i'\beta_i) \end{align*}\]

Taking the first order condition gives us the solution for $\beta_i$:

\[\begin{align*} \frac{\partial }{\partial \beta_i} (Y'MY - 2\beta_i'X_i MY + \beta_i X_i M X_i'\beta_i) &= 0 \\ -2X_i MY + 2 \beta_i X_i M X_i' &= 0 \\ \beta_i &= (X_i M X_i')^{-1}(X_i M Y) \end{align*}\]

Using the idempotent, $M = M’M$, and residual properties, $X_i’M = M’X_i = (I - P)’X_i = \varepsilon_X$ and $Y’M = M’Y = (I -P)’Y =\varepsilon_Y$, we have the following equation:

\[\begin{align*} (X_i M X_i')^{-1}(Y'MX_i') &= (X_i M'M X_i')^{-1}(X_iM'MY) \\ &= (\varepsilon_X'\varepsilon_X)^{-1}(\varepsilon_X \varepsilon_Y) \end{align*}\]

The last equation demonstrates that $\beta_i$ is the solution for the regression of residual on residual.