Permítanos mostrar el resultado para el caso general del cual su fórmula para la estadística de prueba es un caso especial. En general, debemos verificar que el estadístico puede, según la caracterización de la distribución F , escribirse como la relación de χ2 rvs independientes dividida por sus grados de libertad.
Sea H0 0: R′β= r con R y r conocidos, no aleatorios y R : k × q tiene el rango de columna completo q . Esto representa q restricciones lineales para (a diferencia de la notación OP) k regresores incluyendo el término constante. Entonces, en el ejemplo de @ user1627466, p - 1 corresponde a las restricciones q= k - 1 de establecer todos los coeficientes de pendiente a cero.
En vista de Va r ( β^ols) =σ2( X′X)- 1 , tenemos
R′( β^ols- β) ∼ N( 0 , σ2R′( X′X)- 1R ) ,
de manera que (con si- 1 / 2= { R′( X′X)- 1R }- 1 / 2 siendo un "raíz cuadrada de la matriz" desi- 1= { R′( X′X)- 1R }- 1 , a través de, por ejemplo, una descomposición de Cholesky)
n : = B- 1 / 2σR′( β^ols- β) ∼ N( 0 , yoq) ,
como
Va r ( n )==si- 1 / 2σR′Va r ( β^ols) R B- 1 / 2σsi- 1 / 2σσ2B B- 1 / 2σ= Yo
donde la segunda línea usa la varianza de la OLSE.
Esto, como se muestra en la respuesta que se vincula a (véase también aquí ), es independiente de re: = ( n - k ) σ^2σ2∼ χ2n - k,
donde σ 2=y'MXY/(n-k)es la estimación de la varianza de error imparcial usual, conMX=I-X(X'X)-1X'es la "matriz fabricante residual" de regresión enX.σ^2= y′METROXy/ (n-k)METROX= Yo- X( X′X)- 1X′X
So, as norte′norte is a quadratic form in normals,
norte′norte∼ χ2q/ qre/ (n-k)= ( β^ols- β)′R { R′( X′X)- 1R }- 1R′( β^ols- β) / qσ^2∼ Fq, n - k.
In particular, under H0 0: R′β= r, this reduces to the statistic
F= ( R′β^ols−r)′{R′(X′X)−1R}−1(R′β^ols−r)/qσ^2∼Fq,n−k.
For illustration, consider the special case R′=I, r=0, q=2, σ^2=1 and X′X=I. Then,
F=β^′olsβ^ols/2=β^2ols,1+β^2ols,22,
the squared Euclidean distance of the OLS estimate from the origin standardized by the number of elements - highlighting that, since β^2ols,2 are squared standard normals and hence χ21, the F distribution may be seen as an "average χ2 distribution.
In case you prefer a little simulation (which is of course not a proof!), in which the null is tested that none of the k regressors matter - which they indeed do not, so that we simulate the null distribution.
We see very good agreement between the theoretical density and the histogram of the Monte Carlo test statistics.
library(lmtest)
n <- 100
reps <- 20000
sloperegs <- 5 # number of slope regressors, q or k-1 (minus the constant) in the above notation
critical.value <- qf(p = .95, df1 = sloperegs, df2 = n-sloperegs-1)
# for the null that none of the slope regrssors matter
Fstat <- rep(NA,reps)
for (i in 1:reps){
y <- rnorm(n)
X <- matrix(rnorm(n*sloperegs), ncol=sloperegs)
reg <- lm(y~X)
Fstat[i] <- waldtest(reg, test="F")$F[2]
}
mean(Fstat>critical.value) # very close to 0.05
hist(Fstat, breaks = 60, col="lightblue", freq = F, xlim=c(0,4))
x <- seq(0,6,by=.1)
lines(x, df(x, df1 = sloperegs, df2 = n-sloperegs-1), lwd=2, col="purple")
To see that the versions of the test statistics in the question and the answer are indeed equivalent, note that the null corresponds to the restrictions R′=[0I] and r=0.
Let X=[X1X2] be partitioned according to which coefficients are restricted to be zero under the null (in your case, all but the constant, but the derivation to follow is general). Also, let β^ols=(β^′ols,1,β^′ols,2)′ be the suitably partitioned OLS estimate.
Then,
R′β^ols=β^ols,2
and
R′(X′X)−1R≡D~,
the lower right block of
(XTX)−1=(X′1X1X′2X1X′1X2X′2X2)−1≡(A~C~B~D~)
Now, use results for partitioned inverses to obtain
D~=(X′2X2−X′2X1(X′1X1)−1X′1X2)−1=(X′2MX1X2)−1
where MX1=I−X1(X′1X1)−1X′1.
Thus, the numerator of the F statistic becomes (without the division by q)
Fnum=β^′ols,2(X′2MX1X2)β^ols,2
Next, recall that by the Frisch-Waugh-Lovell theorem we may write
β^ols,2=(X′2MX1X2)−1X′2MX1y
so that
Fnum=y′MX1X2(X′2MX1X2)−1(X′2MX1X2)(X′2MX1X2)−1X′2MX1y=y′MX1X2(X′2MX1X2)−1X′2MX1y
It remains to show that this numerator is identical to USSR−RSSR, the difference in unrestricted and restricted sum of squared residuals.
Here,
RSSR=y′MX1y
is the residual sum of squares from regressing y on X1, i.e., with H0 imposed. In your special case, this is just TSS=∑i(yi−y¯)2, the residuals of a regression on a constant.
Again using FWL (which also shows that the residuals of the two approaches are identical), we can write USSR (SSR in your notation) as the SSR of the regression
MX1yonMX1X2
That is,
USSR====y′M′X1MMX1X2MX1yy′M′X1(I−PMX1X2)MX1yy′MX1y−y′MX1MX1X2((MX1X2)′MX1X2)−1(MX1X2)′MX1yy′MX1y−y′MX1X2(X′2MX1X2)−1X′2MX1y
Thus,
RSSR−USSR==y′MX1y−(y′MX1y−y′MX1X2(X′2MX1X2)−1X′2MX1y)y′MX1X2(X′2MX1X2)−1X′2MX1y