La teoría detrás del argumento de los pesos en R cuando se usa lm ()

Después de un año en la escuela de posgrado, mi comprensión de los "mínimos cuadrados ponderados" es la siguiente: let , sea una matriz de diseño , sea un vector de parámetros, sea un vector de error tal que , donde y . Entonces el modelo $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta \in \mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\boldsymbol\epsilon \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{V})$ $\mathbf{V} = \text{diag}(v_1, v_2, \dots, v_n)$ $\sigma^2 > 0$

y = X β + ϵ

$\mathbf{y} = \mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon$ bajo los supuestos se llama el modelo de "mínimos cuadrados ponderados". El problema de WLS termina siendo encontrar

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) .

$\begin{equation} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)\text{.} \end{equation}$ Supongamos que

y = {[\begin{matrix} y_{1} & \dots & y_{n} \end{matrix}]}^{T}

$\mathbf{y} = \begin{bmatrix} y_1 & \dots & y_n\end{bmatrix}^{T}$ ,

β = {[\begin{matrix} β_{1} & \dots & β_{p} \end{matrix}]}^{T}

$\boldsymbol\beta = \begin{bmatrix} \beta_1 & \dots & \beta_p\end{bmatrix}^{T}$ y

X = [\begin{matrix} x_{11} & \dots & x_{1 p} \\ x_{21} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ \\ x_{n 1} & \dots & x_{n p} \end{matrix}] = [\begin{matrix} x_{1}^{T} \\ x_{2}^{T} \\ ⋮ \\ x_{n}^{T} \end{matrix}] .

$\mathbf{X} = \begin{bmatrix} x_{11} & \cdots & x_{1p} \\ x_{21} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots \\ x_{n1} & \cdots & x_{np} \end{bmatrix} = \begin{bmatrix} \mathbf{x}_{1}^{T} \\ \mathbf{x}_{2}^{T} \\ \vdots \\ \mathbf{x}_{n}^{T} \end{bmatrix}\text{.}$

x_{i}^{T} β \in R^{1}

$\mathbf{x}_i^{T}\boldsymbol\beta\in \mathbb{R}^1$ , entonces

y - X β = [\begin{matrix} y_{1} - x_{1}^{T} β \\ y_{2} - x_{2}^{T} β \\ ⋮ \\ y_{n} - x_{n}^{T} β \end{matrix}] .

$\mathbf{y}-\mathbf{X}\boldsymbol\beta = \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta \\ y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta \\ \vdots \\ y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{.}$ Esto da

\begin{aligned} (y - X β)^{T} V^{- 1} & = [\begin{matrix} y_{1} - x_{1}^{T} β & y_{2} - x_{2}^{T} β & \dots & y_{n} - x_{n}^{T} β \end{matrix}] diag (v_{1}^{- 1}, v_{2}^{- 1}, \dots, v_{n}^{- 1}) \\ = [\begin{matrix} v_{1}^{- 1} (y_{1} - x_{1}^{T} β) & v_{2}^{- 1} (y_{2} - x_{2}^{T} β) & \dots & v_{n}^{- 1} (y_{n} - x_{n}^{T} β) \end{matrix}] \end{aligned}

$\begin{align} (\mathbf{y}-\mathbf{X}\boldsymbol\beta)^{T}\mathbf{V}^{-1} &= \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta &y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta & \cdots & y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{diag}(v_1^{-1}, v_2^{-1}, \dots, v_n^{-1}) \\ &= \begin{bmatrix} v_1^{-1}(y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta) &v_2^{-1}(y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta) & \cdots & v_n^{-1}(y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta) \end{bmatrix} \end{align}$ v_n ^ {- 1} (y_n- \ mathbf {x} _ {n} ^ {T} \ boldsymbol \ beta) \ end {bmatrix} \ end {align} dando así

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) = \arg min_{β} \sum_{i = 1}^{n} v_{i}^{- 1} (y_{i} - x_{i}^{T} β)^{2} .

β

$\boldsymbol\beta$ se estima usando

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y .

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}\text{.}$ Esta es la extensión del conocimiento con el que estoy familiarizado. Nunca me enseñaron cómo deberían elegirse

v_{1}, v_{2}, \dots, v_{n}

$v_1, v_2, \dots, v_n$ , aunque parece que, a juzgar por esto , generalmente

Var (ϵ) = diag (σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{n}^{2})

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_n)$ , lo que tiene sentido intuitivo. (Proporcione pesos muy variables menos peso en el problema WLS, y brinde observaciones con menos variabilidad más peso).

Lo que me interesa especialmente es cómo Rmaneja los pesos en la lm()función cuando los pesos se asignan como enteros. De usar ?lm:

Las no NULLponderaciones se pueden usar para indicar que las diferentes observaciones tienen diferentes variaciones (con los valores en pesos inversamente proporcionales a las variaciones); o de manera equivalente, cuando los elementos de los pesos son enteros positivos , que cada respuesta es la media de las observaciones de peso unitario (incluido el caso de que hay observaciones iguales a y los datos se han resumido). $w_i$ $y_i$ $w_i$ $w_i$ $y_i$

He releído este párrafo varias veces, y no tiene sentido para mí. Usando el marco que desarrollé anteriormente, supongamos que tengo los siguientes valores simulados:

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)

lm(y~x, weights = weights)

Call:
lm(formula = y ~ x, weights = weights)

Coefficients:
(Intercept)            x  
     0.3495       0.2834

Usando el marco que he desarrollado anteriormente, ¿cómo se derivan estos parámetros? Aquí está mi intento de hacer esto a mano: suponiendo , tenemos y hacer esto en give (tenga en cuenta que la invertibilidad no funciona en este caso, por lo que utilicé un inverso generalizado): $\mathbf{V} = \text{diag}(50, 85, 75)$

\begin{aligned} [\begin{matrix} {\hat{β}}_{0} \\ {\hat{β}}_{1} \end{matrix}] = \\ {([\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] diag (1 / 50, 1 / 85, 1 / 75) {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T})}^{- 1} {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T} diag (1 / 50, 1 / 85, 1 / 75) [\begin{matrix} 0.25 \\ 0.75 \\ 0.85 \end{matrix}] \end{aligned}

$\begin{align}&\begin{bmatrix} \hat\beta_0 \\ \hat\beta_1 \end{bmatrix} = \\ &\left(\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T} \right)^{-1}\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 0.25 \\ 0.75 \\ 0.85 \end{bmatrix} \end{align}$ R

X <- matrix(rep(1, times = 6), byrow = T, nrow = 3, ncol = 2)
V_inv <- diag(c(1/50, 1/85, 1/75))
y <- c(0.25, 0.75, 0.85)

library(MASS)
ginv(t(X) %*% V_inv %*% X) %*% t(X) %*% V_inv %*% y

         [,1]
[1,] 0.278913
[2,] 0.278913

Estos no coinciden con los valores de la lm()salida. ¿Qué estoy haciendo mal?

r linear-model weighted-regression

— Clarinetista
fuente

La matriz debe ser no Además, tu deberías ser , no . $X$

[\begin{matrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{matrix}],

$\begin{bmatrix} 1 & 0\\ 1 & 1\\ 1 & 2 \end{bmatrix},$

[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] .

$\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}.$ V_invdiag(weights)diag(1/weights)

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)
X <- cbind(1, x)

> solve(t(X) %*% diag(weights) %*% X, t(X) %*% diag(weights) %*% y)
       [,1]
  0.3495122
x 0.2834146

— mark999
fuente

¡Gracias por aclarar la matriz de diseño incorrecta, especialmente! Estoy bastante oxidado con este material. Entonces, como última pregunta, ¿significa esto que en los supuestos de WLS?

Var (ϵ) = diag (1 / weights)

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(1/\text{weights})$

— Clarinetista

Sí, aunque los pesos solo tienen que ser proporcionales a 1 / varianza, no necesariamente iguales. Por ejemplo, si usa weights <- c(50, 85, 75)/2en su ejemplo, obtendrá el mismo resultado.

— mark999

Para responder esto de manera más concisa, la regresión ponderada de mínimos cuadrados usando weightsin Rhace los siguientes supuestos: supongamos que tenemos weights = c(w_1, w_2, ..., w_n). Deje que , sea una matriz de diseño , sea un vector de parámetros y puede ser un vector de error con media y varianza matriz , donde . Luego, Siguiendo los mismos pasos de la derivación en la publicación original, tenemos $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta\in\mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\mathbf{0}$ $\sigma^2\mathbf{V}$ $\sigma^2 > 0$

V = diag (1 / w_{1}, 1 / w_{2}, \dots, 1 / w_{n}) .

$\mathbf{V} = \text{diag}(1/w_1, 1/w_2, \dots, 1/w_n)\text{.}$

\begin{aligned} \arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) & = \arg min_{β} \sum_{i = 1}^{n} (1 / w_{i})^{- 1} (y_{i} - x_{i}^{T} β)^{2} \\ = \arg min_{β} \sum_{i = 1}^{n} w_{i} (y_{i} - x_{i}^{T} β)^{2} \end{aligned}

$\begin{align} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)&= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}(1/w_i)^{-1}(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \\ &= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}w_i(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \end{align}$ y se estima usando del GLS supuestos .

β

$\boldsymbol\beta$

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}$

— Clarinetista
fuente