¿Por qué no se define la varianza como la diferencia entre cada valor que se sigue?

19

Esta puede ser una pregunta simple para muchos, pero aquí está:

¿Por qué no se define la varianza como la diferencia entre cada valor que se sigue entre sí en lugar de la diferencia con el promedio de los valores?

Esta sería la opción más lógica para mí, supongo que obviamente estoy supervisando algunas desventajas. Gracias

EDITAR:

Permítanme reformular lo más claramente posible. Esto es lo que quiero decir:

Suponga que tiene un rango de números, ordenados: 1,2,3,4,5
Calcule y resuma las diferencias (absolutas) (continuamente, entre cada valor siguiente, no en pares) entre valores (sin usar el promedio).
Divide por número de diferencias
(Seguimiento: la respuesta sería diferente si los números no estuvieran ordenados)

-> ¿Cuáles son las desventajas de este enfoque en comparación con la fórmula estándar para la varianza?

variance

— usuario2305193
fuente

1

También puede estar interesado en leer sobre autocorrelación (por ejemplo, stats.stackexchange.com/questions/185521/… ).

— Tim

2

@ user2305193 la respuesta de whuber es correcta, pero su fórmula utiliza la distancia al cuadrado entre un pedido de datos y el promedio de todos los pedidos. Un buen truco, sin embargo, el proceso de encontrar la varianza que ha indicado es exactamente lo que intenté implementar en mi respuesta, y demostré que no haría un buen trabajo. Tratando de despejar la confusión.

— Greenparker

1

Para divertirse, busque la Allan Variance.

— hobbs

En otro pensamiento, supongo que dado que no cuadras las diferencias (y no tomas la raíz cuadrada después) sino que tomas los valores absolutos, esto debería ser más bien '¿por qué no es así como calculamos la desviación estándar?' en lugar de "por qué no es así como calculamos la varianza". Pero ahora voy a descansar

— user2305193

27

La razón más obvia es que a menudo no hay una secuencia de tiempo en los valores. Entonces, si mezcla los datos, no hay diferencia en la información transmitida por los datos. Si seguimos su método, cada vez que mezcle los datos obtendrá una variación de muestra diferente.

La respuesta más teórica es que la varianza muestral estima la varianza verdadera de una variable aleatoria. La verdadera varianza de una variable aleatoria es $X$

E [(X - E X)^{2}] .

$E\left[ (X - EX)^2 \right].$

Aquí representa la expectativa o "valor promedio". Entonces, la definición de la varianza es la distancia cuadrática promedio entre la variable y su valor promedio. Cuando observa esta definición, no hay "orden de tiempo" aquí ya que no hay datos. Es solo un atributo de la variable aleatoria. $E$

Cuando recopila datos iid de esta distribución, tiene realizaciones . La mejor manera de estimar la expectativa es tomar los promedios de muestra. La clave aquí es que obtuvimos los datos de iid y, por lo tanto, no hay pedidos para los datos. La muestra es la misma que la muestra $x_1, x_2, \dots, x_n$ $x_1, x_2, \dots, x_n$ $x_2, x_5, x_1, x_n..$

EDITAR

La varianza de la muestra mide un tipo específico de dispersión para la muestra, el que mide la distancia promedio desde la media. Existen otros tipos de dispersión, como el rango de datos y el rango intercuartil.

Incluso si ordena sus valores en orden ascendente, eso no cambia las características de la muestra. La muestra (datos) que obtiene son realizaciones de una variable. Calcular la varianza de la muestra es similar a comprender cuánta dispersión hay en la variable. Entonces, por ejemplo, si muestreas 20 personas y calculas su altura, entonces esas son 20 "realizaciones" de la variable aleatoria altura de las personas. Ahora se supone que la varianza muestral mide la variabilidad en la altura de los individuos en general. Si solicita los datos $X =$

100, 110, 123, 124, \dots,

$100, 110, 123, 124, \dots,$

eso no cambia la información en la muestra.

Veamos un ejemplo más. digamos que usted tiene 100 observaciones de una variable aleatoria ordenado de esta manera Luego, la distancia subsiguiente promedio es de 1 unidades, por lo que según su método, la varianza será 1.

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, . . . 100.

$1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, ... 100.$

La forma de interpretar "varianza" o "dispersión" es entender qué rango de valores es probable para los datos. En este caso, obtendrá un rango de .99 unidades, que por supuesto no representa bien la variación.

Si en lugar de tomar el promedio simplemente suma las diferencias subsiguientes, entonces su varianza será 99. Por supuesto, eso no representa la variabilidad en la muestra, porque 99 le da el rango de los datos, no una sensación de variabilidad.

— Greenparker
fuente

1

Con el último párrafo que me contactó, jaja, gracias por esta respuesta asombrosa, desearía tener suficiente representante para votarlo, por favor gente, ¡háganlo por mí ;-) ACEPTADO!

— user2305193

Seguimiento a seguimiento: lo que realmente quise decir (sí, lo siento, solo me di cuenta de la pregunta correcta después de leer su respuesta) fue sumar las diferencias y dividirlas entre el número de muestras. En su último ejemplo, sería 99/100, ¿puede explicarlo para una total estupefacción?

— user2305193

@ user2305193 Correcto, dije 1 unidad en promedio, lo cual es incorrecto. Debería haber sido .99 unidades. Lo cambié

— Greenparker

Para obtener más información sobre la serie 1-100: la varianza en 1-100 sería 841.7 y la desviación estándar de la fuente 29.01 . De hecho, un resultado bastante diferente.

— user2305193

31

Que se define de esa manera!

Aquí está el álgebra. Deje que los valores sean . Denotamos por la función de distribución empírica de estos valores (que significa que cada contribuye con una masa de probabilidad de en el valor ) y dejar que y variables aleatorias independientes con distribución . En virtud de las propiedades básicas de la varianza (es decir, es una forma cuadrática), así como la definición de y el hecho $\mathbf{x}=(x_1, x_2, \ldots, x_n)$ $F$ $x_i$ $1/n$ $x_i$ $X$ $Y$ $F$ $F$ e tienen la misma media, $X$ $Y$

\begin{aligned} Var (x) & = Var (X) = \frac{1}{2} (Var (X) + Var (Y)) = \frac{1}{2} (Var (X - Y)) \\ = \frac{1}{2} (E ((X - Y)^{2}) - E (X - Y)^{2}) \\ = E (\frac{1}{2} (X - Y)^{2}) - 0 \\ = \frac{1}{n^{2}} \sum_{i, j} \frac{1}{2} (x_{i} - x_{j})^{2} . \end{aligned}

$\eqalign{ \operatorname{Var}(\mathbf{x})&=\operatorname{Var}(X) = \frac{1}{2}\left(\operatorname{Var}(X) + \operatorname{Var}(Y)\right)=\frac{1}{2}\left(\operatorname{Var}(X-Y)\right)\\ &=\frac{1}{2}\left(\mathbb{E}((X-Y)^2) - \mathbb{E}(X-Y)^2\right)\\ &=\mathbb{E}\left(\frac{1}{2}(X-Y)^2\right) - 0\\ &=\frac{1}{n^2}\sum_{i,j}\frac{1}{2}(x_i - x_j)^2. }$

Esta fórmula no depende de la forma en que se ordena : utiliza todos los pares de componentes posibles, comparándolos con la mitad de sus diferencias al cuadrado. Sin embargo, puede estar relacionado con un promedio sobre todos los ordenamientos posibles (el grupo de todas las Permutaciones de los índices ). A saber, $\mathbf{x}$ $\mathfrak{S}(n)$ $n!$ $1,2,\ldots, n$

Var (x) = \frac{1}{n^{2}} \sum_{i, j} \frac{1}{2} (x_{i} - x_{j})^{2} = \frac{1}{n!} \sum_{σ \in S (n)} \frac{1}{n} \sum_{i = 1}^{n - 1} \frac{1}{2} (x_{σ (i)} - x_{σ (i + 1)})^{2} .

$\operatorname{Var}(\mathbf{x})=\frac{1}{n^2}\sum_{i,j}\frac{1}{2}(x_i - x_j)^2 = \frac{1}{n!}\sum_{\sigma\in\mathfrak{S}(n)} \frac{1}{n} \sum_{i=1}^{n-1} \frac{1}{2}(x_{\sigma(i)} - x_{\sigma(i+1)})^2.$

Esa suma interna toma los valores reordenados y suma las diferencias (medias) al cuadrado entre todos los pares sucesivos . La división por promedia esencialmente estas sucesivas diferencias al cuadrado . Calcula lo que se conoce como la semivariancia lag-1 . La suma externa hace esto para todos los ordenamientos posibles . $x_{\sigma(1)}, x_{\sigma(2)}, \ldots, x_{\sigma(n)}$ $n-1$ $n$

Estas dos vistas algebraicas equivalentes de la fórmula de varianza estándar dan una nueva perspectiva de lo que significa la varianza. La semivariancia es una medida inversa de la covarianza en serie de una secuencia: la covarianza es alta (y los números están positivamente correlacionados) cuando la semivariancia es baja, y viceversa. La varianza de un conjunto de datos desordenado , entonces, es una especie de promedio de todas las semivariaciones posibles obtenibles bajo reordenamientos arbitrarios.

— whuber
fuente

1

@Mur1lo On the contrary: I believe this derivation is correct. Apply the formula to some data and see!

— whuber

1

I think Mur1lo may have been talking not about the correctness of the formula for variance but about apparently passing directly from expectations of random variables to functions of sample quantities.

— Glen_b -Reinstate Monica

1

@glen But that's precisely what the empirical distribution function lets us do. That's the entire point of this approach.

— whuber

3

Yes, that's clear to me; I was trying to point out where the confusion seemed to lay. Sorry to be vague. Hopefully it's clearer now why it only appears* to be a problem.

$\:$ *(this why I used the word "apparent" earlier, to emphasize it was just the out-of-context appearance of that step that was likely to be the cause of the confusion)

— Glen_b -Reinstate Monica

2

@Mur1o The only thing I have done in any of these equations is to apply definitions. There is no passing from expectations to "sample quantities". (In particular, no sample of

F

$F$ has been posited or used.) Thus I am unable to identify what the apparent problem is, nor suggest an alternative explanation. If you could expand on your concern then I might be able to respond.

— whuber

11

Just a complement to the other answers, variance can be computed as the squared difference between terms:

\begin{aligned} Var (X) = \\ \frac{1}{2 \cdot n^{2}} \sum_{i}^{n} \sum_{j}^{n} {(x_{i} - x_{j})}^{2} = \\ \frac{1}{2 \cdot n^{2}} \sum_{i}^{n} \sum_{j}^{n} {(x_{i} - \bar{x} - x_{j} + \bar{x})}^{2} = \\ \frac{1}{2 \cdot n^{2}} \sum_{i}^{n} \sum_{j}^{n} ((x_{i} - \bar{x}) - (x_{j} - \bar{x}))^{2} = \\ \frac{1}{n} \sum_{i}^{n} {(x_{i} - \bar{x})}^{2} \end{aligned}

$\begin{align} &\text{Var}(X) = \\ &\frac{1}{2\cdot n^2}\sum_i^n\sum_j^n \left(x_i-x_j\right)^2 = \\ &\frac{1}{2\cdot n^2}\sum_i^n\sum_j^n \left(x_i - \overline x -x_j + \overline x\right)^2 = \\ &\frac{1}{2\cdot n^2}\sum_i^n\sum_j^n \left((x_i - \overline x) -(x_j - \overline x\right))^2 = \\ &\frac{1}{n}\sum_i^n \left(x_i - \overline x \right)^2 \end{align}$

I think this is the closest to the OP proposition. Remember the variance is a measure of dispersion of every observation at once, not only between "neighboring" numbers in the set.

UPDATE

Using your example: $X = {1, 2, 3, 4, 5}$ . We know the variance is $Var(X) = 2$ .

With your proposed method $Var(X) = 1$ , so we know beforehand taking the differences between neighbors as variance doesn't add up. What I meant was taking every possible difference squared then summed:

V a r (X) = = \frac{(5 - 1)^{2} + (5 - 2)^{2} + (5 - 3)^{2} + (5 - 4)^{2} + (5 - 5)^{2} + (4 - 1)^{2} + (4 - 2)^{2} + (4 - 3)^{2} + (4 - 4)^{2} + (4 - 5)^{2} + (3 - 1)^{2} + (3 - 2)^{2} + (3 - 3)^{2} + (3 - 4)^{2} + (3 - 5)^{2} + (2 - 1)^{2} + (2 - 2)^{2} + (2 - 3)^{2} + (2 - 4)^{2} + (2 - 5)^{2} + (1 - 1)^{2} + (1 - 2)^{2} + (1 - 3)^{2} + (1 - 4)^{2} + (1 - 5)^{2}}{2 \cdot 5^{2}} = = \frac{16 + 9 + 4 + 1 + 9 + 4 + 1 + 1 + 4 + 1 + 1 + 4 + 1 + 1 + 4 + 9 + 1 + 4 + 9 + 16}{50} = = 2

$Var(X) = \\ = \frac{(5-1)^2+(5-2)^2+(5-3)^2+(5-4)^2+(5-5)^2+(4-1)^2+(4-2)^2+(4-3)^2+(4-4)^2+(4-5)^2+(3-1)^2+(3-2)^2+(3-3)^2+(3-4)^2+(3-5)^2+(2-1)^2+(2-2)^2+(2-3)^2+(2-4)^2+(2-5)^2+(1-1)^2+(1-2)^2+(1-3)^2+(1-4)^2+(1-5)^2}{2 \cdot 5^2} = \\ =\frac{16+9+4+1+9+4+1+1+4+1+1+4+1+1+4+9+1+4+9+16}{50} = \\ =2$

— Firebug
fuente

Now I'm seriously confused guys

— user2305193

@user2305193 In your question, did you mean every pairwise difference or did you mean the difference between a value and the next in a sequence? Could you please clarify?

— Firebug

2

@Mur1lo no one is though, I have no idea what you're referring to.

— Firebug

2

@Mur1lo This is a general question, and I answered it generally. Variance is a computable parameter, which can be estimated from samples. This question isn't about estimation though. Also we are talking about discrete sets, not about continuous distributions.

— Firebug

1

You showed how to estimate the variance by its U-statistic and its fine. The problem is when you write: Var("upper case"X) = things involving "lower case" x, you are mixing the two different notions of parameter and of estimator.

— Mur1lo

6

Others have answered about the usefulness of variance defined as usual. Anyway, we just have two legitimate definitions of different things: the usual definition of variance, and your definition.

Then, the main question is why the first one is called variance and not yours. That is just a matter of convention. Until 1918 you could have invented anything you want and called it "variance", but in 1918 Fisher used that name to what is still called variance, and if you want to define anything else you will need to find another name to name it.

The other question is if the thing you defined might be useful for anything. Others have pointed its problems to be used as a measure of dispersion, but it's up to you to find applications for it. Maybe you find so useful applications that in a century your thing is more famous than variance.

— Pere
fuente

I know every definition is up to the people deciding on it, I really was looking for help in up/downsides for each approaches. Usually there's good reason for people converging to a definition and as I suspected didn't see why straight away.

— user2305193

1

Fisher introduced variance as a term in 1918 but the idea is older.

— Nick Cox

As far as I know, Fisher was the first one to use the name "variance" for variance. That's why I say that before 1918 you could have use "variance" to name anything else you had invented.

— Pere

3

La respuesta de @GreenParker es más completa, pero un ejemplo intuitivo podría ser útil para ilustrar el inconveniente de su enfoque.

En su pregunta, parece suponer que el orden en que aparecen las realizaciones de una variable aleatoria es importante. Sin embargo, es fácil pensar en ejemplos en los que no lo hace.

Considere el ejemplo de la altura de los individuos en una población. El orden en que se miden los individuos es irrelevante tanto para la altura media en la población como para la varianza (cómo se distribuyen esos valores alrededor de la media).

Su método parecería extraño aplicado a tal caso.

— Antoine Vernet
fuente

2

Although there are many good answers to this question I believe some important points where left behind and since this question came up with a really interesting point I would like to provide yet another point of view.

Why isn't variance defined as the difference between every value following    
each other instead of the difference to the average of the values?

The first thing to have in mind is that the variance is a particular kind of parameter, and not a certain type of calculation. There is a rigorous mathematical definition of what a parameter is but for the time been we can think of then as mathematical operations on the distribution of a random variable. For example if $X$ is a random variable with distribution function $F_X$ then its mean $\mu_x$ , which is also a parameter, is:

μ_{X} = \int_{- \infty}^{+ \infty} x d F_{X} (x)

$\mu_X = \int_{-\infty}^{+\infty}xdF_{X}(x)$

and the variance of $X$ , $\sigma^2_X$ , is:

σ_{X}^{2} = \int_{- \infty}^{+ \infty} (x - μ_{X})^{2} d F_{X} (x)

$\sigma^2_X = \int_{-\infty}^{+\infty}(x - \mu_X)^2dF_{X}(x)$

The role of estimation in statistics is to provide, from a set of realizations of a r.v., a good approximation for the parameters of interest.

What I wanted to show is that there is a big difference in the concepts of a parameters (the variance for this particular question) and the statistic we use to estimate it.

Why isn't the variance calculated this way?

So we want to estimate the variance of a random variable $X$ from a set of independent realizations of it, lets say $x = \{x_1,\ldots,x_n\}$ . The way you propose doing it is by computing the absolute value of successive differences, summing and taking the mean:

ψ (x) = \frac{1}{n} \sum_{i = 2}^{n} | x_{i} - x_{i - 1} |

$\psi(x) = \frac{1}{n}\sum_{i = 2}^{n}|x_i - x_{i-1}|$

and the usual statistic is:

S^{2} (x) = \frac{1}{n - 1} \sum_{i = i}^{n} (x_{i} - \bar{x})^{2},

$S^2(x) = \frac{1}{n-1}\sum_{i = i}^{n}(x_i - \bar{x})^2,$

where $\bar{x}$ is the sample mean.

When comparing two estimator of a parameter the usual criterion for the best one is that which has minimal mean square error (MSE), and a important property of MSE is that it can be decomposed in two components:

MSE = estimator bias + estimator variance.

Using this criterion the usual statistic, $S^2$ , has some advantages over the one you suggests.

First it is a unbiased estimator of the variance but your statistic is not unbiased.
One other important thing is that if we are working with the normal distribution then $S^2$ is the best unbiased estimator of $\sigma^2$ in the sense that it has the smallest variance among all unbiased estimators and thus minimizes the MSE.

When normality is assumed, as is the case in many applications, $S^2$ is the natural choice when you want to estimate the variance.

— Mur1lo
fuente

3

Everything in this answer is well explained, correct, and interesting. However, introducing the "usual statistic" as an estimator confuses the issue, because the question is not about estimation, nor about bias, nor about the distinction between

1 / n

$1/n$ and

1 / (n - 1)

$1/(n-1)$ . That confusion might be at the root of your comments to several other answers in this thread.

— whuber

2

The time-stepped difference is indeed used in one form, the Allan Variance. http://www.allanstime.com/AllanVariance/

— Lee J Rickard
fuente

1

Lots of good answers here, but I'll add a few.

The way it is defined now has proven useful. For example, normal distributions appear all the time in data and a normal distribution is defined by its mean and variance. Edit: as @whuber pointed out in a comment, there are various other ways specify a normal distribution. But none of them, as far as I'm aware, deal with pairs of points in sequence.
Variance as normally defined gives you a measure of how spread out the data is. For example, lets say you have a lot of data points with a mean of zero but when you look at it, you see that the data is mostly either around -1 or around 1. Your variance would be about 1. However, under your measure, you would get a total of zero. Which one is more useful? Well, it depends, but its not clear to me that a measure of zero for its "variance" would make sense.
It lets you do other stuff. Just an example, in my stats class we saw a video about comparing pitchers (in baseball) over time. As I remember it, pitchers appeared to be getting worse since the proportion of pitches that were hit (or were home-runs) was going up. One reason is that batters were getting better. This made it hard to compare pitchers over time. However, they could use the z-score of the pitchers to compare them over time.

Nonetheless, as @Pere said, your metric might prove itself very useful in the future.

— roundsquare
fuente

1

A normal distribution can also be determined by its mean and fourth central moment, for that matter -- or by means of many other pairs of moments. The variance is not special in that way.

— whuber

@whuber interesting. I'll admit I didn't realize that. Nonetheless, unless I'm mistaken, all the moments are "variance like" in that they are based on distances from a certain point as opposed to dealing with pairs of points in sequence. But I'll edit my answers to make note of what you said.

— roundsquare

1

Could you explain the sense in which you mean "deal with pairs of points in sequence"? That's not a part of any standard definition of a moment. Note, too, that all the absolute moments around the mean--which includes all even moments around the mean--give a "measure of how spread out the data" are. One could, therefore, construct an analog of the Z-score with them. Thus, none of your three points appears to differentiate the variance from any absolute central moment.

— whuber

@whuber yeah. The original question posited a 4 step sequence where you sort the points, take the differences between each point and the next point, and then average these. That's what I referred to as "deal[ing] with pairs of points in sequence". So you are right, none of the three points I gave distinguishes variance from any absolute central moment - they are meant to distinguish variance (and, I suppose, all absolute central moments) from the procedure described in the original question.

— roundsquare