Prueba de que la estadística F sigue a la distribución F


20

A la luz de esta pregunta: Prueba de que los coeficientes en un modelo OLS siguen una distribución t con (nk) grados de libertad

Me encantaría entender por qué

F=(TSSRSS)/(p1)RSS/(np),

donde p es el número de parámetros del modelo n el número de observaciones y TSS la varianza total, RSS la varianza residual, sigue una distribución Fp1,np .

Debo admitir que ni siquiera he intentado demostrarlo, ya que no sabría por dónde empezar.


Christoph Hanck y Francis ya han dado una muy buena respuesta. Si aún tiene dificultades para comprender la prueba de la prueba f para la regresión lineal, intente pagar teamdable.github.io/techblog/… . Escribí la publicación del blog sobre la prueba de la prueba de regresión lineal. Está escrito en coreano, pero puede que no sea un problema porque casi todo es una fórmula matemática. Espero que ayude si aún tiene dificultades para comprender la prueba de la prueba f para la regresión lineal.
Taeho Oh,

Si bien este enlace puede responder la pregunta, es mejor incluir aquí las partes esenciales de la respuesta y proporcionar el enlace como referencia. Las respuestas de solo enlace pueden volverse inválidas si la página vinculada cambia. - De la opinión
mkt - Restablecer Mónica

Respuestas:


19

Permítanos mostrar el resultado para el caso general del cual su fórmula para la estadística de prueba es un caso especial. En general, debemos verificar que el estadístico puede, según la caracterización de la distribución F , escribirse como la relación de χ2 rvs independientes dividida por sus grados de libertad.

Sea H0 0:Rβ=r con R y r conocidos, no aleatorios y R:k×q tiene el rango de columna completo q . Esto representa q restricciones lineales para (a diferencia de la notación OP) k regresores incluyendo el término constante. Entonces, en el ejemplo de @ user1627466, pags-1 corresponde a las restricciones q=k-1 de establecer todos los coeficientes de pendiente a cero.

En vista de Vunar(β^ols)=σ2(XX)-1 , tenemos

R(β^ols-β)norte(0 0,σ2R(XX)-1R),
de manera que (con B1/2={R(XX)1R}1/2 siendo un "raíz cuadrada de la matriz" desi-1={R(XX)-1R}-1 , a través de, por ejemplo, una descomposición de Cholesky)
norte: =si-1/ /2σR(β^ols-β)norte(0 0,yoq),
como
Vunar(norte)=si-1/ /2σRVunar(β^ols)Rsi-1/ /2σ=si-1/ /2σσ2sisi-1/ /2σ=yo
donde la segunda línea usa la varianza de la OLSE.

Esto, como se muestra en la respuesta que se vincula a (véase también aquí ), es independiente de

re: =(norte-k)σ^2σ2χnorte-k2,
donde σ 2=y'MXY/(n-k)es la estimación de la varianza de error imparcial usual, conMX=I-X(X'X)-1X'es la "matriz fabricante residual" de regresión enX.σ^2=yMETROXy/ /(norte-k)METROX=yo-X(XX)-1XX

So, as nortenorte is a quadratic form in normals,

nortenorteχq2/ /qre/ /(norte-k)=(β^ols-β)R{R(XX)-1R}-1R(β^ols-β)/ /qσ^2Fq,norte-k.
In particular, under H0 0:Rβ=r, this reduces to the statistic
F=(Rβ^olsr){R(XX)1R}1(Rβ^olsr)/qσ^2Fq,nk.

For illustration, consider the special case R=I, r=0, q=2, σ^2=1 and XX=I. Then,

F=β^olsβ^ols/2=β^ols,12+β^ols,222,
the squared Euclidean distance of the OLS estimate from the origin standardized by the number of elements - highlighting that, since β^ols,22 are squared standard normals and hence χ12, the F distribution may be seen as an "average χ2 distribution.

In case you prefer a little simulation (which is of course not a proof!), in which the null is tested that none of the k regressors matter - which they indeed do not, so that we simulate the null distribution.

ingrese la descripción de la imagen aquí

We see very good agreement between the theoretical density and the histogram of the Monte Carlo test statistics.

library(lmtest)
n <- 100
reps <- 20000
sloperegs <- 5 # number of slope regressors, q or k-1 (minus the constant) in the above notation
critical.value <- qf(p = .95, df1 = sloperegs, df2 = n-sloperegs-1) 
# for the null that none of the slope regrssors matter

Fstat <- rep(NA,reps)
for (i in 1:reps){
  y <- rnorm(n)
  X <- matrix(rnorm(n*sloperegs), ncol=sloperegs)
  reg <- lm(y~X)
  Fstat[i] <- waldtest(reg, test="F")$F[2] 
}

mean(Fstat>critical.value) # very close to 0.05

hist(Fstat, breaks = 60, col="lightblue", freq = F, xlim=c(0,4))
x <- seq(0,6,by=.1)
lines(x, df(x, df1 = sloperegs, df2 = n-sloperegs-1), lwd=2, col="purple")

To see that the versions of the test statistics in the question and the answer are indeed equivalent, note that the null corresponds to the restrictions R=[0I] and r=0.

Let X=[X1X2] be partitioned according to which coefficients are restricted to be zero under the null (in your case, all but the constant, but the derivation to follow is general). Also, let β^ols=(β^ols,1,β^ols,2) be the suitably partitioned OLS estimate.

Then,

Rβ^ols=β^ols,2
and
R(XX)1RD~,
the lower right block of
(XTX)1=(X1X1X1X2X2X1X2X2)1(A~B~C~D~)
Now, use results for partitioned inverses to obtain
D~=(X2X2X2X1(X1X1)1X1X2)1=(X2MX1X2)1
where MX1=IX1(X1X1)1X1.

Thus, the numerator of the F statistic becomes (without the division by q)

Fnum=β^ols,2(X2MX1X2)β^ols,2
Next, recall that by the Frisch-Waugh-Lovell theorem we may write
β^ols,2=(X2MX1X2)1X2MX1y
so that
Fnum=yMX1X2(X2MX1X2)1(X2MX1X2)(X2MX1X2)1X2MX1y=yMX1X2(X2MX1X2)1X2MX1y

It remains to show that this numerator is identical to USSRRSSR, the difference in unrestricted and restricted sum of squared residuals.

Here,

RSSR=yMX1y
is the residual sum of squares from regressing y on X1, i.e., with H0 imposed. In your special case, this is just TSS=i(yiy¯)2, the residuals of a regression on a constant.

Again using FWL (which also shows that the residuals of the two approaches are identical), we can write USSR (SSR in your notation) as the SSR of the regression

MX1yonMX1X2

That is,

USSR=yMX1MMX1X2MX1y=yMX1(IPMX1X2)MX1y=yMX1yyMX1MX1X2((MX1X2)MX1X2)1(MX1X2)MX1y=yMX1yyMX1X2(X2MX1X2)1X2MX1y

Thus,

RSSRUSSR=yMX1y(yMX1yyMX1X2(X2MX1X2)1X2MX1y)=yMX1X2(X2MX1X2)1X2MX1y


Thanks. I don't know if it's considered hand holding at this point but how do you go from your sum of squared betas to an expression that contains sum of squares?
user1627466

1
@user1627466, I added a derivation of the equivalence of the two formulae.
Christoph Hanck

4

@ChristophHanck has provided a very comprehensive answer, here I will add a sketch of proof on the special case OP mentioned. Hopefully it's also easier to follow for beginners.

A random variable YFd1,d2 if

Y=X1/d1X2/d2,
where X1χd12 and X2χd22 are independent. Thus, to show that the F-statistic has F-distribution, we may as well show that cESSχp12 and cRSSχnp2 for some constant c, and that they are independent.

In OLS model we write

y=Xβ+ε,
where X is a n×p matrix, and ideally εNn(0,σ2I). For convenience we introduce the hat matrix H=X(XTX)1XT (note y^=Hy), and the residual maker M=IH. Important properties of H and M are that they are both symmetric and idempotent. In addition, we have tr(H)=p and HX=X, these will come in handy later.

Let us denote the matrix of all ones as J, the sum of squares can then be expressed with quadratic forms:

TSS=yT(I1nJ)y,RSS=yTMy,ESS=yT(H1nJ)y.
Note that M+(HJ/n)+J/n=I. One can verify that J/n is idempotent and rank(M)+rank(HJ/n)+rank(J/n)=n. It follows from this then that HJ/n is also idempotent and M(HJ/n)=0.

We can now set out to show that F-statistic has F-distribution (search Cochran's theorem for more). Here we need two facts:

  1. Let xNn(μ,Σ). Suppose A is symmetric with rank r and AΣ is idempotent, then xTAxχr2(μTAμ/2), i.e. non-central χ2 with d.f. r and non-centrality μTAμ/2. This is a special case of Baldessari's result, a proof can also be found here.
  2. Let xNn(μ,Σ). If AΣB=0, then xTAx and xTBx are independent. This is known as Craig's theorem.

Since yNn(Xβ,σ2I), we have

ESSσ2=(yσ)T(H1nJ)yσχp12((Xβ)T(HJn)Xβ).
However, under null hypothesis β=0, so really ESS/σ2χp12. On the other hand, note that yTMy=εTMε since HX=X. Therefore RSS/σ2χnp2. Since M(HJ/n)=0, ESS/σ2 and RSS/σ2 are also independent. It immediately follows then
F=(TSSRSS)/(p1)RSS/(np)=ESSσ2/(p1)RSSσ2/(np)Fp1,np.
Al usar nuestro sitio, usted reconoce que ha leído y comprende nuestra Política de Cookies y Política de Privacidad.
Licensed under cc by-sa 3.0 with attribution required.