Cómo crear una matriz de covarianza arbitraria

21

Por ejemplo, en R, la MASS::mvrnorm()función es útil para generar datos para demostrar varias cosas en las estadísticas. Toma un Sigmaargumento obligatorio que es una matriz simétrica que especifica la matriz de covarianza de las variables. ¿Cómo crearía un simétrico $n\times n$ matrix with arbitrary entries?

r random-generation covariance-matrix

— rsl
fuente

3

I think this question would benefit from being edited to focus on "how can I create an arbitrary covariance matrix" and less on the coding aspect. There is certainly an on-topic underlying statistical issue here, as demonstrated by the answer.

— Silverfish

2

Related: How to efficiently generate random positive-semidefinite correlation matrices?

— amoeba says Reinstate Monica

22

Create an $n\times n$ matrix $A$ with arbitrary values

and then use $\Sigma = A^T A$ as your covariance matrix.

For example

n <- 4  
A <- matrix(runif(n^2)*2-1, ncol=n) 
Sigma <- t(A) %*% A

— Henry
fuente

Likewise, Sigma <- A + t(A).

— rsl

6

@MoazzemHossen: Your suggestion will produce a symmetric matrix, but it may not always be positive semidefinite (e.g. your suggestion could produce a matrix with negative eigenvalues) and so it may not be suitable as a covariance matrix

— Henry

Yes, I noticed that R returns error in the event my suggested way produced unsuitable matrix.

— rsl

4

Note that if you prefer a correlation matrix for better interpretability, there is the ?cov2cor function, which can be applied subsequently.

— gung - Reinstate Monica

1

@B11b: You need your covariance matrix to be positive semi-definite. That would put some limits on the covariance values, not totally obvious ones when

n > 2

$n \gt 2$

— Henry

24

I like to have control over the objects I create, even when they might be arbitrary.

Consider, then, that all possible $n\times n$ covariance matrices $\Sigma$ can be expressed in the form

Σ = P^{'} Diagonal (σ_{1}, σ_{2}, \dots, σ_{n}) P

$\Sigma= P^\prime\ \text{Diagonal}(\sigma_1,\sigma_2,\ldots, \sigma_n)\ P$

where $P$ is an orthogonal matrix and $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$ .

Geometrically this describes a covariance structure with a range of principal components of sizes $\sigma_i$ . These components point in the directions of the rows of $P$ . See the figures at Making sense of principal component analysis, eigenvectors & eigenvalues for examples with $n=3$ . Setting the $\sigma_i$ will set the magnitudes of the covariances and their relative sizes, thereby determining any desired ellipsoidal shape. The rows of $P$ orient the axes of the shape as you prefer.

One algebraic and computing benefit of this approach is that when $\sigma_n \gt 0$ , $\Sigma$ is readily inverted (which is a common operation on covariance matrices):

Σ^{- 1} = P^{'} Diagonal (1 / σ_{1}, 1 / σ_{2}, \dots, 1 / σ_{n}) P .

$\Sigma^{-1} = P^\prime\ \text{Diagonal}(1/\sigma_1, 1/\sigma_2, \ldots, 1/\sigma_n)\ P.$

Don't care about the directions, but only about the ranges of sizes of the the $\sigma_i$ ? That's fine: you can easily generate a random orthogonal matrix. Just wrap $n^2$ iid standard Normal values into a square matrix and then orthogonalize it. It will almost surely work (provided $n$ isn't huge). The QR decomposition will do that, as in this code

n <- 5
p <- qr.Q(qr(matrix(rnorm(n^2), n)))

This works because the $n$ -variate multinormal distribution so generated is "elliptical": it is invariant under all rotations and reflections (through the origin). Thus, all orthogonal matrices are generated uniformly, as argued at How to generate uniformly distributed points on the surface of the 3-d unit sphere?.

A quick way to obtain $\Sigma$ from $P$ and the $\sigma_i$ , once you have specified or created them, uses crossprod and exploits R's re-use of arrays in arithmetic operations, as in this example with $\sigma=(\sigma_1, \ldots, \sigma_5) = (5,4,3,2,1)$ :

Sigma <- crossprod(p, p*(5:1))

As a check, the Singular Value decomposition should return both $\sigma$ and $P^\prime$ . You may inspect it with the command

svd(Sigma)

The inverse of Sigma of course is obtained merely by changing the multiplication by $\sigma$ into a division:

Tau <- crossprod(p, p/(5:1))

You may verify this by viewing zapsmall(Sigma %*% Tau), which should be the $n\times n$ identity matrix. A generalized inverse (essential for regression calculations) is obtained by replacing any $\sigma_i \ne 0$ by $1/\sigma_i$ , exactly as above, but keeping any zeros among the $\sigma_i$ as they were.

— whuber
fuente

It might help to demonstrate how to use the rows of

P

$P$ to orient the axes as preferred.

— gung - Reinstate Monica

1

Might be worth mentioning that the singular values in svd(Sigma) will be reordered -- that confused me for a minute.

— FrankD

1

You can simulate random positive definite matrices from the Wishart distribution using the function "rWishart" from the widely used package "stats".

n <- 4
rWishart(1,n,diag(n))

— Carlos Llosa
fuente

1

There is a package specifically for that, clusterGeneration (written among other by Harry Joe, a big name in that field).

There are two main functions:

genPositiveDefMat generate a covariance matrix, 4 different methods
rcorrmatrix : generate a correlation matrix

Quick example:

library(clusterGeneration)
#> Loading required package: MASS
genPositiveDefMat("unifcorrmat",dim=3)
#> $egvalues
#> [1] 15.408962  5.673916  1.228842
#> 
#> $Sigma
#>          [,1]     [,2]     [,3]
#> [1,] 6.714871 1.643449 6.530493
#> [2,] 1.643449 6.568033 2.312455
#> [3,] 6.530493 2.312455 9.028815
genPositiveDefMat("eigen",dim=3)
#> $egvalues
#> [1] 8.409136 4.076442 2.256715
#> 
#> $Sigma
#>            [,1]       [,2]      [,3]
#> [1,]  2.3217300 -0.1467812 0.5220522
#> [2,] -0.1467812  4.1126757 0.5049819
#> [3,]  0.5220522  0.5049819 8.3078880

^{Created on 2019-10-27 by the reprex package (v0.3.0)}

Finally, note that an alternative approach is to do a first try from scratch, then use Matrix::nearPD() to make your matrix positive-definite.

— Matifou
fuente