Un problema sobre la estimabilidad de los parámetros.

Sean $Y_1,Y_2,Y_3$ e $Y_4$ cuatro variables aleatorias tales que $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$ , donde $\theta_1,\theta_2,\theta_3$ son parámetros desconocidos. Supongamos también que $Var(Y_i)=\sigma^2$ ,Entonces, ¿cuál es la verdad? $i=1,2,3,4.$

A. son estimables. $\theta_1,\theta_2,\theta_3$

B. es estimable. $\theta_1+\theta_3$

C. es estimable y es la mejor estimación imparcial lineal de . $\theta_1-\theta_3$ $\dfrac{1}{2}(Y_1+Y_3)$ $\theta_1-\theta_3$

D. es estimable. $\theta_2$

La respuesta que se da es C, que me parece extraño (porque obtuve D).

¿Por qué tengo D? Como, . $E(Y_2-Y_4)=2\theta_2$

¿Por qué no entiendo que C podría ser una respuesta? Ok, puedo ver, es un estimador imparcial de , y su 'varianza es menor que . $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ $\theta_1-\theta_3$ $\dfrac{Y_1+Y_3}{2}$

Por favor, dime dónde estoy haciendo mal.

También publicado aquí: /math/2568894/a-problem-on-estimability-of-parameters

self-study estimation inference

— Stat_prob_001
fuente

Ponga una self-studyetiqueta o alguien vendrá y cerrará su pregunta.

— Carl

@Carl está hecho, pero ¿por qué?

— Stat_prob_001

Son las reglas del sitio, no mis reglas, las reglas del sitio.

— Carl

¿Es

Y1≠Y3 $Y_1\neq Y_3$ ?

— Carl

@Carl puedes pensar de esta manera:

Y1=θ1−θ3+ϵ1 $Y_1=\theta_1-\theta_3+\epsilon_1$ donde

ϵ1 $\epsilon_1$ es un rv con media

0 $0$ y varianza

σ2 $\sigma^2$ . Y,

Y3=θ1−θ3+ϵ3 $Y_3=\theta_1-\theta_3+\epsilon_3$ donde

ϵ3 $\epsilon_3$ es un rv con media

0 $0$ y varianza

σ2 $\sigma^2$

— Stat_prob_001

Respuestas:

Esta respuesta enfatiza la verificación de la estimabilidad. La propiedad de varianza mínima es de mi consideración secundaria.

Para comenzar, resuma la información en términos de forma matricial de un modelo lineal de la siguiente manera: donde(para discutir la estimabilidad, el supuesto de esferidad no es necesario. Pero para discutir la propiedad de Gauss-Markov, necesitamos asumir la esferidad de).

Y : = ⎡ ⎣ ⎢ ⎢ ⎢ Y 1 Y 2 Y 3 Y 4 ⎤ ⎦ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 - 1 - 1 - 1 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎡ ⎣ ⎢ θ 1 θ 2 θ 3 ⎤ ⎦ ⎥ + ⎡ ⎣ ⎢ ⎢ ⎢ ε 1 ε 2 ε 3 ε 4 ⎤ ⎦ ⎥ ⎥ ⎥ : = X β + ε, (1)

$\begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align}$

E(ε)=0,Var(ε)=σ2I $E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$

ε $\varepsilon$

Si la matriz de diseño es de rango completo, entonces el parámetro original admite un único mínimos cuadrados estiman . En consecuencia, cualquier parámetro , definida como una función lineal de es estimable en el sentido de que puede ser inequívocamente estima por datos a través de los mínimos cuadrados estimar como . $X$ $\beta$ $\hat{\beta} = (X'X)^{-1}X'Y$ $\phi$ $\phi(\beta)$ $\beta$ $\hat{\beta}$ $\hat{\phi} = p'\hat{\beta}$

La sutileza surge cuando no es de rango completo. Para tener una discusión exhaustiva, primero arreglamos algunas anotaciones y términos (sigo la convención de El enfoque libre de coordenadas para modelos lineales , Sección 4.8. Algunos de los términos suenan innecesariamente técnicos). Además, la discusión se aplica al modelo lineal general con y . $X$ $Y = X\beta + \varepsilon$ $X \in \mathbb{R}^{n \times k}$ $\beta \in \mathbb{R}^k$

A regression manifold is the collection of mean vectors as $\beta$ varies over $\mathbb{R}^k$ : $M = {X β : β \in R k} .$ $M = \{X\beta: \beta \in \mathbb{R}^k\}.$

A parametric functional $\phi = \phi(\beta)$ is a linear functional of $\beta$ , $ϕ (β) = p' β = p 1 β 1 + \dots + p k β k .$ $\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$

As mentioned above, when $\text{rank}(X) < k$ , not every parametric functional $\phi(\beta)$ is estimable. But, wait, what is the definition of the term estimable technically? It seems difficult to give a clear definition without bothering a little linear algebra. One definition, which I think is the most intuitive, is as follows (from the same aforementioned reference):

Definition 1. A parametric functional $\phi(\beta)$ is estimable if it is uniquely determined by $X\beta$ in the sense that $\phi(\beta_1) = \phi(\beta_2)$ whenever $\beta_1,\beta_2 \in \mathbb{R}^k$ satisfy $X\beta_1 = X\beta_2$ .

Interpretation. The above definition stipulates that the mapping from the regression manifold $M$ to the parameter space of $\phi$ must be one-to-one, which is guaranteed when $\text{rank}(X) = k$ (i.e., when $X$ itself is one-to-one). When $\text{rank}(X) < k$ , we know that there exist $\beta_1 \neq \beta_2$ such that $X\beta_1 = X\beta_2$ . The estimable definition above in effect rules out those structural-deficient parametric functionals that result in different values themselves even with the same value on $M$ , which don't make sense naturally. On the other hand, an estimable parametric functional $\phi(\cdot)$ does allow the case $\phi(\beta_1) = \phi(\beta_2)$ with $\beta_1 \neq \beta_2$ , as long as the condition $X\beta_1 = X\beta_2$ is fulfilled.

There are other equivalent conditions to check the estimability of a parametric functional given in the same reference, Proposition 8.4.

After such a verbose background introduction, let's come back to your question.

A. $\beta$ itself is non-estimable for the reason that $\text{rank}(X) < 3$ , which entails $X\beta_1 = X\beta_2$ with $\beta_1 \neq \beta_2$ . Although the above definition is given for scalar functionals, it is easily generalized to vector-valued functionals.

B. $\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ is non-estimable. To wit, consider $\beta_1 = (0, 1, 0)'$ and $\beta_2 = (1, 1, 1)'$ , which gives $X\beta_1 = X\beta_2$ but $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$ .

C. $\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ is estimable. Because $X\beta_1 = X\beta_2$ trivially implies $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$ , i.e., $\phi_2(\beta_1) = \phi_2(\beta_2)$ .

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$ . Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$ , by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$ .

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is
$⎡ ⎣ ⎢ 40 - 4 020 - 4 04 ⎤ ⎦ ⎥ β^= ⎡ ⎣ ⎢ 10 - 1 11 - 1 10 - 1 1 - 1 - 1 ⎤ ⎦ ⎥ Y,$ $\begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation}$ which, after simplification, is $⎡ ⎣ ⎢ ⎢ ϕ (β^) θ^2 / 2 - ϕ (β^) ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ Y ¯ (Y 2 - Y 4) / 4 - Y ¯ ⎤ ⎦ ⎥,$ $\begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation}$ i.e., $\phi(\hat{\beta}) = \bar{Y}$ .

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$ , regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$ , it can be reformulated as

E [Y] = X β, β \in R k . (2)

$\begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}$

It is such a simple model that we only specified the first moment form of the response vector $Y$ . When $\text{rank}(X) = k$ , model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$ .).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$ , how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$ , by definition, this is $\phi_1 = \phi_2$ , hence model $(3)$ is identifiable when indexing with $\phi$ . Conversely, suppose model $(3)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$ , which is trivially $\phi_1(\beta) = \phi_2(\beta)$ .

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows

E [Y] = ⎡ ⎣ ⎢ ⎢ ⎢ 1111 010 - 1 ⎤ ⎦ ⎥ ⎥ ⎥ [ϕ 2 ϕ 3] = X ~ γ .

$\begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}$

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

— Zhanxiong
fuente

If you need a proof for the second part of option C, I will supplement my answer.

— Zhanxiong

thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not

14(Y1+Y2+Y3+Y4) $\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"?

— Stat_prob_001

Oh, I don't know why I thought it is the estimator in C. Actually

(Y1+Y2+Y3+Y4)/4 $(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer

— Zhanxiong

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$ . We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form

t λ (Y) = \sum i = 1 4 λ i Y i

$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$ for constants

λ=(λi) $\lambda = (\lambda_i)$ .

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$ . By linearity of expectation,

θ 1 - θ 3 = E [t λ (Y)] = \sum i = 1 4 λ i E [Y i] = λ 1 (θ 1 - θ 3) + λ 2 (θ 1 + θ 2 - θ 3) + λ 3 (θ 1 - θ 3) + λ 4 (θ 1 - θ 2 - θ 3) = (λ 1 + λ 2 + λ 3 + λ 4) (θ 1 - θ 3) + (λ 2 - λ 4) θ 2 .

$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$

Comparing coefficients of the unknown quantities $\theta_i$ reveals

λ 2 - λ 4 = 0 and λ 1 + λ 2 + λ 3 + λ 4 = 1. (1)

$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$ .

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$ . All we have to do is minimize $(2)$ , which can be written

$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$

No constraints apply to $(u,v)$ . Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$ , it is now obvious that the unique solution is

$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

— whuber
fuente