Probabilidad máxima restringida con un rango de columna inferior a

Esta pregunta trata sobre la estimación de máxima verosimilitud restringida (REML) en una versión particular del modelo lineal, a saber:

Y = X (α) β + ϵ, ϵ \sim N_{n} (0, Σ (α)),

$Y = X(\alpha)\beta + \epsilon, \\ \epsilon\sim N_n(0, \Sigma(\alpha)),$

donde es una matriz ( ) parametrizada por , como lo es . es un vector desconocido de parámetros molestos; el interés está en estimar , y tenemos . Estimar el modelo por máxima probabilidad no es problema, pero quiero usar REML. Es bien sabido, ver, por ejemplo , LaMotte , que la probabilidad , donde es cualquier matriz semi-ortogonal tal que $X(\alpha)$ $n \times p$ $\alpha \in \mathbb R^k$ $\Sigma(\alpha)$ $\beta$ $\alpha$ $k\leq p\ll n$ $A'Y$ $A$ $A'X=0$ se puede escribir

L_{REML} (α ∣ Y) \propto | X^{'} X |^{1 / 2} | Σ |^{- 1 / 2} | X^{'} Σ^{- 1} X |^{- 1 / 2} \exp {- \frac{1}{2} r^{'} Σ^{- 1} r}, r = (I - X (X^{'} Σ^{- 1} X)^{+} X^{'} Σ^{- 1}) Y,

$L_{\text{REML}}(\alpha\mid Y) \propto\vert X'X\vert^{1/2} \vert \Sigma\vert^{-1/2}\vert X'\Sigma^{-1}X\vert^{-1/2}\exp\left\{-\frac{1}{2} r'\Sigma^{-1}r \right\}, \\ r = (I - X(X'\Sigma^{-1}X)^+X'\Sigma^{-1})Y,$

cuando es el rango de columna completo $X$ .

Mi problema es que, para algunos perfectamente razonables y científicamente interesantes, la matriz no es de rango completo de columna. Todas las derivaciones que he visto de la probabilidad restringida anterior hacen uso de igualdades determinantes que no son aplicables cuando , es decir, que asumen rango de columna llena de . Esto significa que la probabilidad restringida anterior solo es correcta para mi configuración en partes del espacio de parámetros y, por lo tanto, no es lo que quiero optimizar. $\alpha$ $X(\alpha)$ $\vert X'X\vert=0$ $X$

Pregunta: ¿Existen probabilidades restringidas más generales, derivadas, en la literatura estadística o en otros lugares, sin el supuesto de que sea un rango de columna completo? Si es así, ¿cómo se ven? $X$

Algunas observaciones

Derivar la parte exponencial no es problema para ninguna y puede escribirse en términos del inverso de Moore-Penrose como se indicó anteriormente. $X(\alpha)$
Las columnas de son una base ortonormal (cualquiera) para $A$ $C(X)^\bot$
Para conocido , la probabilidad de puede escribirse fácilmente para cada , pero, por supuesto, el número de vectores de base, es decir, columnas, en depende del rango de columnas de $A$ $A'Y$ $\alpha$ $A$ $X$

Si alguien está interesado en esta cuestión cree que la parametrización exacta de ayudaría, hágamelo saber y voy a escribirlas. Sin embargo, en este punto, estoy principalmente interesado en un REML para una general de las dimensiones correctas. $X,\Sigma$ $X$

Una descripción más detallada del modelo sigue aquí. Sea sea una Autoregresión vectorial de primer orden dimensional [VAR (1)] donde . Supongamos que el proceso se inicia en algún valor fijo en el tiempo . $y_t = \mu + Ay_{t - 1} + v_t, t = 1, \dots, T$ $r$ $v_t \overset{iid}{\sim}N(0, \Omega)$ $y_0$ $t = 0$

Defina . El modelo se puede escribir en la forma de modelo lineal utilizando las siguientes definiciones y notación: $Y = [y_1', \dots, y_T']'$ $Y = X\beta + \varepsilon$

\begin{aligned} X & = [1_{T} \otimes I_{r}, C^{- 1} B] \\ β & = [μ^{'}, y_{0}^{'} - μ^{'}]^{'} \\ v a r (ε)^{- 1} & = C^{'} (I_{T} \otimes Ω^{- 1}) C \\ C & = [\begin{matrix} I_{r} & 0 & 0 & \dots \\ - A & I_{r} & 0 & \dots \\ 0 & - A & I_{r} & \dots \\ ⋮ & ⋮ & ⋮ & ⋱ \end{matrix}] \\ B & = e_{1, T} \otimes A, \end{aligned}

$\begin{align} X &= [1_T \otimes I_r, C^{-1}B] \\ \beta &= [\mu', y_0' - \mu']' \\ \mathrm{var}(\varepsilon)^{-1} &= C'(I_T \otimes \Omega^{-1})C \\ C &= \begin{bmatrix} I_r & 0 & 0 & \cdots \\ -A & I_r & 0 & \cdots \\ 0 & -A & I_r & \cdots \\ \vdots & \vdots & \vdots & \ddots \end{bmatrix} \\ B &= e_{1, T} \otimes A, \end{align}$

donde denota un vector dimensional de unos y el primer vector de la base estándar de . $1_T$ $T-$ $e_{1,T}$ $\mathbb R^T$

Denote . Tenga en cuenta que si no es rango completo, entonces no es rango completo de columna. Esto incluye, por ejemplo, casos en los que uno de los componentes de no depende del pasado. $\alpha = \mathrm{vec}(A)$ $A$ $X(\alpha)$ $y_t$

La idea de estimar los VAR usando REML es bien conocida en, por ejemplo, la literatura de regresiones predictivas (ver, por ejemplo, Phillips y Chen y las referencias en ellas).

Puede valer la pena aclarar que la matriz no es una matriz de diseño en el sentido habitual, simplemente se cae del modelo y, a menos que haya un conocimiento a priori sobre $X$ $A$ no hay forma, por lo que puedo decir, de volver a parametrizar para ser de rango completo.

He publicado una pregunta en math.stackexchange que está relacionada con esta en el sentido de que una respuesta a la pregunta de matemática puede ayudar a derivar una probabilidad de que responda esta pregunta.

— ekvall
fuente

Maybe one way to address the question is to ask, what happens in linear mixed models when the model matrix is not full column rank?

— Greenparker

Gracias por la recompensa @Greenparker. Y sí, si pudiera escribirse una probabilidad restringida para un modelo mixto lineal, con una matriz de diseño de efectos fijos de rango inferior a la columna completa, eso ayudaría.

— ekvall

Derivar la parte exponencial no es un problema para ninguna X (α) X (α) y puede escribirse en términos del inverso de Moore-Penrose como se indicó anteriormente.

Tengo dudas de que esta observación sea correcta. El inverso generalizado en realidad pone una restricción lineal adicional en sus estimadores [Rao y Mitra], por lo tanto, debemos considerar la probabilidad conjunta como un todo en lugar de adivinar "el inverso de Moore-Penrose funcionará para la parte exponencial". Esto parece formalmente correcto, pero probablemente no entiendas el modelo mixto correctamente.

(1) ¿Cómo pensar correctamente los modelos de efectos mixtos? $\blacksquare$

Debe pensar el modelo de efectos mixtos de una manera diferente antes de intentar conectar el g-inverso (OR Moore-Penrose inverso, que es un tipo especial de g-inverso inverso [Rao y Mitra]) mecánicamente en la fórmula dada por RMLE (Restringido Estimador de máxima verosimilitud, el mismo a continuación).

X = (\begin{array}{cc} f i x e d e f f e c t \\ r a n d o m e f f e c t \end{array})

$\boldsymbol{X}=\left(\begin{array}{cc} fixed\quad effect\\ & random\quad effect \end{array}\right)$

A common way of thinking mixed effect is that the random effect part in the design matrix is introduced by measurement error, which bears another name of "stochastic predictor" if we care more about prediction rather than estimation. This is also one historical motivation of study of stochastic matrix in setting of statistics.

My problem is that for some perfectly reasonable, and scientifically interesting, αα the matrix X(α)X(α) is not of full column rank.

Given this way of thinking the likelihood, the probability that $X(\alpha)$ is not of full rank is zero. This is because determinant function is continuous in entries of matrix and the normal distribution is a continuous distribution that assigns zero probability to a single point. The probability of defective rank $X(\alpha)$ is positive iff you parameterized it in a pathological way like $\left(\begin{array}{ccc} \alpha & \alpha\\ \alpha & \alpha\\ & & random\quad effect \end{array}\right)$ .

So the solution to your question is also rather straight forward, you simply perturb your design matrix $X_\epsilon(\alpha)=X(\alpha)+\epsilon\left(\begin{array}{cc} I & 0\\ 0 & 0 \end{array}\right)$ (perturb the fixed effect part only), and use the perturbed matrix(which is full rank) to carry out all derivations. Unless your model has complicated hierarchies or $X$ itself is near singular, I do not see there is a serious problem when you take $\epsilon\rightarrow 0$ in the final result since determinant function is continuous and we can take the limit inside the determinant function. $lim_{\epsilon\rightarrow 0}|X_\epsilon|=|lim_{\epsilon\rightarrow 0}X_\epsilon|$ . And in perturbation form the inverse of $X_\epsilon$ can be obtained by Sherman-Morrision-Woodbury Theorem. And the determinant of matrix $I+X$ is given in standard linear algebra book like [Horn&Johnson]. Of course we can write the determinant in terms of each entry of the matrix, but perturbation is always preferred[Horn&Johnson].

$\blacksquare$ (2)How should we deal with nuisance parameters in a model?

As you see, to deal with the random effect part in the model, we should regard it as sort of "nuisance parameter". The problem is: Is RMLE the most appropriate way of eliminating a nuisance parameter? Even in GLM and mixed effect models, RMLE is far from the only choice. [Basu] pointed out that many other ways of eliminating parameters in setting of estimation. Today people tend to choose inbetween RMLE and Bayesian modeling because they correspond to two popular computer based solutions: EM and MCMC respectively.

In my opinion it is definitely more suitable to introduce a prior in the situation of defective rank in the fixed effect part. Or you can reparameterize your model in order to make it into a full rank one.

Further, in case your fixed effect is not of full rank, you might worry above mis-specified covariance structure because the degrees of freedom in fixed effects should have go into the error part. To see this point more clearly, you may want to consider the MLE(also LSE) for the GLS(General least squre) $\hat{\beta}=(X\Sigma^{-1} X')^{-1}\Sigma^{-1}y$ where $\Sigma$ is the covariance structure of the error term, for the case where $X(\alpha)$ is not full rank.

$\blacksquare$ (3)Further comments

The problem is not how you modify the RMLE to make it work in the case that fixed effect part of the matrix is not of full rank; the problem is that in that case your model itself may be problematic if non full-rank case has positive probability.

One relevant case I have encountered is that in the spatial case people may want to reduce the rank of fixed effect part due to computational consideration[Wikle].

I have not seen any "scientifically interesting" case in such situation, can you point out some literature where the non full-rank case is of major concern? I would like to know and discuss further, thanks.

$\blacksquare$ Reference

[Rao&Mitra]Rao, Calyampudi Radhakrishna, and Sujit Kumar Mitra. Generalized inverse of matrices and its applications. Vol. 7. New York: Wiley, 1971.

[Basu]Basu, Debabrata. "On the elimination of nuisance parameters." Journal of the American Statistical Association 72.358 (1977): 355-366.

[Horn&Johnson]Horn, Roger A., and Charles R. Johnson. Matrix analysis. Cambridge university press, 2012.

[Wikle]Wikle, Christopher K. "Low-rank representations for spatial processes." Handbook of Spatial Statistics (2010): 107-118.

— Henry.L
fuente

Thanks for your interest and very thought through answer, + 1 for effort. I will read it in more detail and come back with some clarifications. I think a first thing that I will have to clarify is that there are no random effects in this model, and the matrix

X

$X$ is not a design matrix at all, except perhaps by name fr lack of a better word; it's a highly non-linear function (deterministic) of the parameter

α

$\alpha$ which consists of (the vectorization of) the coefficient matrix in a vector autoregressive process, so the concept of probability of being low-rank is not meaningful.

— ekvall

@Student001 Yes, feel free to make any clarification since I also feel it more like a GLM instead of mixed model. I will try to answer again if I can:)

— Henry.L

@Student001 If you can, do write the whole model and I would like to study such case, possibly AR(1) in spatial setting I guess.

— Henry.L

"Given this way of thinking the likelihood, the probability that

X (α)

$X(\alpha)$ is not of full rank is zero." Right answer, wrong problem. The probability that it will be numerically not of full rank in finite precision is non-zero.

— Mark L. Stone

@MarkL.Stone I already provided perturbation as a solution if you read lines carefully, which is a standard solution to numerical singularity. And the OP said he will update the description, so I guess we will reach some consesus on the correctly formulated problem.

— Henry.L