La paradoja de los datos de iid (al menos para mí)

En cuanto a mi agregada (y escasos) conocimientos sobre estadísticas permisos, entendí que si $X_1, X_2,..., X_n$ son variables aleatorias iid, luego, como el término implica, son independientes y están distribuidas de manera idéntica.

Mi preocupación aquí es la antigua propiedad de muestras iid, que dice:

p (X_{n} | X_{i_{1}}, X_{i_{2}}, . . ., X_{i_{k}}) = p (X_{n}),

$p(X_{n}|X_{i_1},X_{i_2},...,X_{i_k}) = p(X_{n}),$

para cualquier colección de $i_j$ 's st $1 \leq i_j < n$ .

Sin embargo, se sabe que el conjunto de muestras independientes de distribuciones idénticas proporciona información sobre la estructura de distribución y, como resultado, sobre $X_n$ en el caso anterior, por lo que no debería ser el caso de que:

p (X_{n} | X_{i_{1}}, X_{i_{2}}, . . ., X_{i_{k}}) = p (X_{n}) .

$p(X_{n}|X_{i_1},X_{i_2},...,X_{i_k}) = p(X_{n}).$

Sé que soy víctima de la falacia, pero no sé por qué. Por favor, ayúdame con esto.

sampling conditional-probability independence

— Cupitor
fuente

¿Conoces la regla de Bayes? Oído de los clásicos. vs Estadísticas bayesianas? Priors?

— Matthew Gunn

No sigo el argumento al final de tu pregunta. ¿Puedes ser más explícito?

— Glen_b -Reinstate Monica

@Glen_b, ¿qué es exactamente lo que no sigues? ¿A qué te refieres con el final? Estoy tratando de decir con diferentes lógicas que tanto la igualdad como la desigualdad parecen plausibles, lo cual es una paradoja.

— Cupitor

No hay paradoja aquí, simplemente una falla al aplicar las definiciones apropiadas. ¡No puede pretender tener una paradoja cuando ignora el significado de las palabras que usa! En este caso, comparar la definición de independiente con la de probabilidad revelará el error.

— whuber

@whuber, supongo que has notado el explícito "(al menos para mí)" en el título de mi pregunta y también el hecho de que pido ayuda para encontrar la "falacia" de mi argumento, que apunta al hecho de que esto De hecho, no es una verdadera paradoja.

— Cupitor

Respuestas:

Creo que está confundiendo un modelo estimado de una distribución con una variable aleatoria . Reescribamos el supuesto de independencia de la siguiente manera: que dice que si conoce la distribución subyacente de ( y, por ejemplo, puede identificarlo mediante un conjunto de parámetros

\begin{matrix} (1) & P (X_{n} | θ, X_{i_{1}}, X_{i_{2}}, \dots, X_{i_{k}}) = P (X_{n} | θ) \end{matrix}

$P(X_n | \theta, X_{i_1}, X_{i_2}, \dots, X_{i_k}) = P(X_n | \theta) \tag{1}$ $X_n$

θ

$\theta$ ), entonces la distribución no cambia dado que ha observado algunas muestras de ella.

Por ejemplo, pensar en como la variable aleatoria que representa el resultado de la -ésima lanzamiento de una moneda. Conocer la probabilidad de cara y cola de la moneda (que, por cierto, se supone que está codificada en ) es suficiente para conocer la distribución de . En particular, el resultado de los lanzamientos anteriores no cambia la probabilidad de la cabeza o de la cola para el -ésimo lanzamiento, y se mantiene. $X_n$ $n$ $\theta$ $X_n$ $n$ $(1)$

$P(\theta | X_n) \neq P(\theta | X_{i_1}, X_{i_2}, \dots, X_{i_k})$ .

— Sobi
fuente

Thank you very much. Quite up to the point. Quite funny that I guessed such an answer a while ago but I forgot about it....So as far as I understand the fallacy goes with implicitly assuming "a model" which can parametrize the distribution of random variable. Did I get it right?

— Cupitor

@Cupitor: I'm glad it was useful. Yes, conditioned on the model, the independent random variables do not affect each other. But, how likely a given distribution is to have generated a sequence of outcomes changes as you see more samples from the underlying (true) distribution (regardless of the independence assumption).

— Sobi

If you take a Bayesian approach and treat parameters describing the distribution of $X$ as a random variable/vector, then the observations indeed are not independent, but they would be conditionally independent given knowledge of $\theta$ hence $P(X_n \mid X_{n-1}, \ldots X_1, \theta) = P(X_n \mid \theta)$ would hold.

In a classical statistical approach, $\theta$ is not a random variable. Calculations are done as if we know what $\theta$ is. In some sense, you're always conditioning on $\theta$ (even if you don't know the value).

When you wrote, "... provide information about the distribution structure, and as a result about $X_n$ " you implicitly were adopting a Bayesian approach but not doing it precisely. You're writing a property of IID samples that a frequentist would write, but the corresponding statement in a Bayesian setup would involve conditioning on $\theta$ .

Bayesian vs. Classical statisticians

Let $x_i$ be the result of flipping a lopsided, unfair coin. We don't know the probability the coin lands heads.

To the classical statistician, the frequentist, $P(x_i = H)$ is some parameter, let's call it $\theta$ . Observe that $\theta$ here is a scalar, like the number 1/3. We may not know what the number is, but it's some number! It is not random!
To the Bayesian statistician, $\theta$ itself is a random variable! This is extremely different!

The key idea here is that the Bayesian statistician extends the tools of probability to situations where the classical statistician doesn't. To the frequentist, $\theta$ isn't a random variable because it only has one possible value! Multiple outcomes are not possible! In the Bayesian's imagination though, multiple values of $\theta$ are possible, and the Bayesian is willing to model that uncertainty (in his own mind) using the tools of probability.

Where is this going?

Let's say we flip the coin $n$ times. One flip does not affect the outcome of the other. The classical statistician would call these independent flips (and indeed they are). We'll have:

P (x_{n} = H ∣ x_{n - 1}, x_{n - 2}, \dots, x_{1}) = P (x_{n} = H) = θ

$P(x_n=H \mid x_{n-1}, x_{n-2}, \ldots,x_{1}) = P(x_n=H) = \theta$ Where

θ

$\theta$ is some unknown parameter. (Remember, we don't know what it is, but it's not a random variable! It's some number.)

A Bayesian deep into subjective probability would say that what matters is the probability from her perspective!. If she sees 10 heads in a row, an 11th head is more likely because 10 heads in a row leads one to believe the coin is lopsided in favor of heads.

P (x_{11} = H ∣ x_{10} = H, x_{9} = H, \dots, x_{1} = H) > P (x_{1} = H)

$P(x_{11} = H \mid x_{10}=H, x_{9}=H, \ldots,x_{1}=H) > P(x_1 = H)$

What has happened here? What is different?! Updating beliefs about a latent random variable $\theta$ ! If $\theta$ is treated as a random variable, the flips aren't independent anymore. But, the flips are conditionally independent given the value of $\theta$ .

P (x_{11} = H ∣ x_{10} = H, x_{9} = H, \dots, x_{1} = H, θ) = P (x_{1} = H ∣ θ) = θ

$P(x_{11} = H \mid x_{10}=H, x_{9}=H, \ldots,x_{1}=H, \theta) = P(x_1 = H \mid \theta) = \theta$

Conditioning on $\theta$ in a sense connects how the Bayesian and the classical statistician models the problem. Or to put it another way, the frequentist and the Bayesian statistician will agree if the Bayesian conditions on $\theta$ .

Further notes

I've tried my best to give a short intro here, but what I've done is at best quite superficial and the concepts are in some sense quite deep. If you want to take a dive into the philosophy of probability, Savage's 1954 book, Foundation of Statistics is a classic. Google for bayesian vs. frequentist and a ton of stuff will come up.

Another way to think about IID draws is de Finetti's theorem and the notion of exchangeability. In a Bayesian framework, exchangeability is equivalent to independence conditional on some latent random variable (in this case, the lopsidedness of the coin).

— Matthew Gunn
fuente

In essence, the bayesian approach would treat a statement "i.i.d. random variables" not as an axiom that they must be IID but just as a very strong prior assumption that they are so - and if even stronger evidence suggests that it's extremely unlikely that the given assumptions are true, then this "disbelief in the given conditions" will be reflected in the results.

— Peteris

Thank you very much for your thorough answer. I have upvoted it, but I think Sobi's answer, points out more explicitly where the problem lies, i.e. implicitly assuming the model structure (or this is as far as I understood it)

— Cupitor

@Matthew Gunn: neat, thorough, and very well explained! I learned a few things from your answer, thanks!

— Sobi