La respuesta corta es que su conjetura es verdadera cuando y solo cuando hay una correlación positiva dentro de la clase en los datos . Hablando empíricamente, la mayoría de los conjuntos de datos agrupados la mayor parte del tiempo muestran una correlación positiva dentro de la clase, lo que significa que en la práctica su conjetura suele ser cierta. Pero si la correlación intraclase es 0, entonces los dos casos que mencionó son igualmente informativos. Y si la correlación intraclase es negativa , en realidad es menos informativo tomar menos medidas en más sujetos; en realidad preferiríamos (en lo que respecta a la reducción de la varianza de la estimación del parámetro) tomar todas nuestras mediciones en un solo tema.
Estadísticamente, hay dos perspectivas desde las cuales podemos pensar en esto: un efecto aleatorio (o mixto ) modelo , que usted menciona en su pregunta, o un modelo marginal , que termina siendo un poco más informativo aquí.
Modelo de efectos aleatorios (mixto)
Digamos que tenemos un conjunto de sujetos de los cuales hemos tomado m mediciones cada uno. Entonces, un modelo simple de efectos aleatorios de la medida j del sujeto i podría ser
y i j = β + u i + e i j ,
donde β es la intersección fija, u i es el efecto aleatorio del sujeto (con varianza σ 2 u ), e i j es el término de error de nivel de observación (con varianza σ 2 enmji
yij=β+ui+eij,
βuiσ2ueijσ2e), y los dos últimos términos aleatorios son independientes.
En este modelo, representa la media de la población, y con un conjunto de datos equilibrado (es decir, un número igual de mediciones de cada sujeto), nuestra mejor estimación es simplemente la media de la muestra. Entonces, si tomamos "más información" para significar una varianza menor para esta estimación, entonces básicamente queremos saber cómo la varianza de la media muestral depende de n y m . Con un poco de álgebra podemos resolver esa
var ( 1βnm
Al examinar esta expresión, podemos ver quecada vezquehay alguna variación de sujeto(es decir,σ2u>0), al aumentar el número de sujetos (n), ambos términos serán más pequeños, al tiempo que aumenta el número de mediciones por sujeto (m) solo hará que el segundo término sea más pequeño. (Para una implicación práctica de esto para el diseño de proyectos de replicación de sitios múltiples, veaesta publicación de blog que escribí hace un tiempo).
var(1nm∑i∑jyij)=var(1nm∑i∑jβ+ui+eij)=1n2m2var(∑i∑jui+∑i∑jeij)=1n2m2(m2∑ivar(ui)+∑i∑jvar(eij))=1n2m2(nm2σ2u+nmσ2e)=σ2un+σ2enm.
σ2u>0nm
mnnm
σ2un+constant,
nn=nmm=1
ρ=σ2uσ2u+σ2e
(sketch of a derivation
here). So we can write the variance equation above as
var(1nm∑i∑jyij)=σ2un+σ2enm=(ρn+1−ρnm)(σ2u+σ2e)
This doesn't really add any insight to what we already saw above, but it does make us wonder: since the intra-class correlation is a bona fide correlation coefficient, and correlation coefficients can be negative, what would happen (and what would it mean) if the intra-class correlation were negative?
In the context of the random-effects model, a negative intra-class correlation doesn't really make sense, because it implies that the subject variance σ2u is somehow negative (as we can see from the ρ equation above, and as explained here and here)... but variances can't be negative! But this doesn't mean that the concept of a negative intra-class correlation doesn't make sense; it just means that the random-effects model doesn't have any way to express this concept, which is a failure of the model, not of the concept. To express this concept adequately we need to consider the marginal model.
Marginal model
For this same dataset we could consider a so-called marginal model of yij,
yij=β+e∗ij,
where basically we've pushed the random subject effect
ui from before into the error term
eij so that we have
e∗ij=ui+eij. In the random-effects model we considered the two random terms
ui and
eij to be
i.i.d., but in the marginal model we instead consider
e∗ij to follow a block-diagonal covariance matrix
C like
C=σ2⎡⎣⎢⎢⎢⎢⎢R0⋮00R⋮0⋯⋯⋱⋯00⋮R⎤⎦⎥⎥⎥⎥⎥,R=⎡⎣⎢⎢⎢⎢⎢1ρ⋮ρρ1⋮ρ⋯⋯⋱⋯ρρ⋮1⎤⎦⎥⎥⎥⎥⎥
In words, this means that under the marginal model we simply consider
ρ to be the expected correlation between two
e∗s from the same subject (we assume the correlation across subjects is 0). When
ρ is positive, two observations drawn from the same subject tend to be more similar (closer together), on average, than two observations drawn randomly from the dataset while ignoring the clustering due to subjects. When
ρ is
negative, two observations drawn from the same subject tend to be
less similar (further apart), on average, than two observations drawn completely at random. (More information about this interpretation in
the question/answers here.)
So now when we look at the equation for the variance of the sample mean under the marginal model, we have
var(1nm∑i∑jyij)=var(1nm∑i∑jβ+e∗ij)=1n2m2var(∑i∑je∗ij)=1n2m2(n(mσ2+(m2−m)ρσ2))=σ2(1+(m−1)ρ)nm=(ρn+1−ρnm)σ2,
which is the same variance expression we derived above for the random-effects model, just with
σ2e+σ2u=σ2, which is consistent with our note above that
e∗ij=ui+eij. The advantage of this (statistically equivalent) perspective is that here we can think about a negative intra-class correlation without needing to invoke any weird concepts like a negative subject variance. Negative intra-class correlations just fit naturally in this framework.
(BTW, just a quick aside to point out that the second-to-last line of the derivation above implies that we must have ρ≥−1/(m−1), or else the whole equation is negative, but variances can't be negative! So there is a lower bound on the intra-class correlation that depends on how many measurements we have per cluster. For m=2 (i.e., we measure each subject twice), the intra-class correlation can go all the way down to ρ=−1; for m=3 it can only go down to ρ=−1/2; and so on. Fun fact!)
So finally, once again considering the total number of observations nm to be a constant, we see that the second-to-last line of the derivation above just looks like
(1+(m−1)ρ)×positive constant.
So when
ρ>0, having
m as small as possible (so that we take fewer measurements of more subjects--in the limit, 1 measurement of each subject) makes the variance of the estimate as small as possible. But when
ρ<0, we actually want
m to be as
large as possible (so that, in the limit, we take all
nm measurements from a single subject) in order to make the variance as small as possible. And when
ρ=0, the variance of the estimate is just a constant, so our allocation of
m and
n doesn't matter.