I'll show the result for any multiple linear regression, whether the regressors are polynomials of Xt or not. In fact, it shows a little more than what you asked, because it shows that each LOOCV residual is identical to the corresponding leverage-weighted residual from the full regression, not just that you can obtain the LOOCV error as in (5.2) (there could be other ways in which the averages agree, even if not each term in the average is the same).
Let me take the liberty to use slightly adapted notation.
We first show that
β^−β^(t)=(u^t1−ht)(X′X)−1X′t,(A)
where
β^ is the estimate using all data and
β^(t) the estimate when leaving out
X(t), observation
t. Let
Xt be defined as a row vector such that
y^t=Xtβ^.
u^t are the residuals.
The proof uses the following matrix algebraic result.
Let A be a nonsingular matrix, b a vector and λ a scalar. If
λ≠−1b′A−1b
Then
(A+λbb′)−1=A−1−(λ1+λb′A−1b)A−1bb′A−1(B)
The proof of (B) follows immediately from verifying
{A−1−(λ1+λb′A−1b)A−1bb′A−1}(A+λbb′)=I.
The following result is helpful to prove (A)
(X′(t)X(t))−1X′t=(11−ht)(X′X)−1X′t. (C)
Proof of (C): By (B) we have, using ∑Tt=1X′tXt=X′X,
(X′(t)X(t))−1=(X′X−X′tXt)−1=(X′X)−1+(X′X)−1X′tXt(X′X)−11−Xt(X′X)−1X′t.
So we find
(X′(t)X(t))−1X′t=(X′X)−1X′t+(X′X)−1X′t(Xt(X′X)−1X′t1−Xt(X′X)−1X′t)=(11−ht)(X′X)−1X′t.
The proof of (A) now follows from (C): As
X′Xβ^=X′y,
we have
(X′(t)X(t)+X′tXt)β^=X′(t)y(t)+X′tyt,
or
{Ik+(X′(t)X(t))−1X′tXt}β^=β^(t)+(X′(t)X(t))−1X′t(Xtβ^+u^t).
So,
β^=β^(t)+(X′(t)X(t))−1X′tu^t=β^(t)+(X′X)−1X′tu^t1−ht,
where the last equality follows from (C).
Now, note ht=Xt(X′X)−1X′t. Multiply through in (A) by Xt, add yt on both sides and rearrange to get, with u^(t) the residuals resulting from using β^(t) (yt−Xtβ^(t)),
u^(t)=u^t+(u^t1−ht)ht
or
u^(t)=u^t(1−ht)+u^tht1−ht=u^t1−ht