La théorie derrière l'argument des poids dans R lors de l'utilisation de lm ()

Après une année d'études supérieures, ma compréhension des "moindres carrés pondérés" est la suivante: soit , soit matrice de conception , soit un vecteur de paramètres, soit un vecteur d'erreur tel que , où et . Ensuite, le modèle $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta \in \mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\boldsymbol\epsilon \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{V})$ $\mathbf{V} = \text{diag}(v_1, v_2, \dots, v_n)$ $\sigma^2 > 0$

y = X β + ϵ

$\mathbf{y} = \mathbf{X}\boldsymbol\beta + \boldsymbol\epsilon$ sous les hypothèses est appelé le modèle des "moindres carrés pondérés". Le problème WLS finit par trouver

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) .

$\begin{equation} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)\text{.} \end{equation}$ Supposons

y = {[\begin{matrix} y_{1} & \dots & y_{n} \end{matrix}]}^{T}

$\mathbf{y} = \begin{bmatrix} y_1 & \dots & y_n\end{bmatrix}^{T}$ ,

β = {[\begin{matrix} β_{1} & \dots & β_{p} \end{matrix}]}^{T}

$\boldsymbol\beta = \begin{bmatrix} \beta_1 & \dots & \beta_p\end{bmatrix}^{T}$ , et

X = [\begin{matrix} x_{11} & \dots & x_{1 p} \\ x_{21} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋮ \\ x_{n 1} & \dots & x_{n p} \end{matrix}] = [\begin{matrix} x_{1}^{T} \\ x_{2}^{T} \\ ⋮ \\ x_{n}^{T} \end{matrix}] .

$\mathbf{X} = \begin{bmatrix} x_{11} & \cdots & x_{1p} \\ x_{21} & \cdots & x_{2p} \\ \vdots & \vdots & \vdots \\ x_{n1} & \cdots & x_{np} \end{bmatrix} = \begin{bmatrix} \mathbf{x}_{1}^{T} \\ \mathbf{x}_{2}^{T} \\ \vdots \\ \mathbf{x}_{n}^{T} \end{bmatrix}\text{.}$

x_{i}^{T} β \in R^{1}

$\mathbf{x}_i^{T}\boldsymbol\beta\in \mathbb{R}^1$ , donc

y - X β = [\begin{matrix} y_{1} - x_{1}^{T} β \\ y_{2} - x_{2}^{T} β \\ ⋮ \\ y_{n} - x_{n}^{T} β \end{matrix}] .

$\mathbf{y}-\mathbf{X}\boldsymbol\beta = \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta \\ y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta \\ \vdots \\ y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{.}$ Cela donne

\begin{aligned} (y - X β)^{T} V^{- 1} & = [\begin{matrix} y_{1} - x_{1}^{T} β & y_{2} - x_{2}^{T} β & \dots & y_{n} - x_{n}^{T} β \end{matrix}] diag (v_{1}^{- 1}, v_{2}^{- 1}, \dots, v_{n}^{- 1}) \\ = [\begin{matrix} v_{1}^{- 1} (y_{1} - x_{1}^{T} β) & v_{2}^{- 1} (y_{2} - x_{2}^{T} β) & \dots & v_{n}^{- 1} (y_{n} - x_{n}^{T} β) \end{matrix}] \end{aligned}

$\begin{align} (\mathbf{y}-\mathbf{X}\boldsymbol\beta)^{T}\mathbf{V}^{-1} &= \begin{bmatrix} y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta &y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta & \cdots & y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta \end{bmatrix}\text{diag}(v_1^{-1}, v_2^{-1}, \dots, v_n^{-1}) \\ &= \begin{bmatrix} v_1^{-1}(y_1-\mathbf{x}_{1}^{T}\boldsymbol\beta) &v_2^{-1}(y_2-\mathbf{x}_{2}^{T}\boldsymbol\beta) & \cdots & v_n^{-1}(y_n-\mathbf{x}_{n}^{T}\boldsymbol\beta) \end{bmatrix} \end{align}$ v_n ^ {- 1} (y_n- \ mathbf {x} _ {n} ^ {T} \ boldsymbol \ beta) \ end {bmatrix} \ end {align} donnant ainsi

\arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) = \arg min_{β} \sum_{i = 1}^{n} v_{i}^{- 1} (y_{i} - x_{i}^{T} β)^{2} .

$\begin{equation} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right) \end{equation} = \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}v_i^{-1}(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2\text{.}$

β

$\boldsymbol\beta$ est estimé à l'aide de

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y .

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}\text{.}$ C'est l'étendue des connaissances que je connais. On ne m'a jamais appris comment

v_{1}, v_{2}, \dots, v_{n}

$v_1, v_2, \dots, v_n$ , bien qu'il semble que, à en juger par ici , cela habituellement

Var (ϵ) = diag (σ_{1}^{2}, σ_{2}^{2}, \dots, σ_{n}^{2})

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(\sigma^2_1, \sigma^2_2, \dots, \sigma^2_n)$ , ce qui est intuitif. (Donner des poids très variables moins de poids dans le problème WLS, et donner des observations avec moins de variabilité plus de poids.)

Ce que je suis particulièrement curieux de savoir, c'est comment Rgère les poids dans la lm()fonction lorsque les poids sont affectés à des entiers. De l'utilisation ?lm:

Les non- NULLpoids peuvent être utilisés pour indiquer que différentes observations ont des variances différentes (les valeurs en poids étant inversement proportionnelles aux variances); ou de manière équivalente, lorsque les éléments de poids sont des entiers positifs , que chaque réponse est la moyenne des observations de poids unitaire (y compris le cas où il y a observations égales à et que les données ont été résumées). $w_i$ $y_i$ $w_i$ $w_i$ $y_i$

J'ai relu ce paragraphe plusieurs fois, et cela n'a aucun sens pour moi. En utilisant le cadre que j'ai développé ci-dessus, supposons que j'ai les valeurs simulées suivantes:

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)

lm(y~x, weights = weights)

Call:
lm(formula = y ~ x, weights = weights)

Coefficients:
(Intercept)            x  
     0.3495       0.2834

En utilisant le cadre que j'ai développé ci-dessus, comment ces paramètres sont-ils dérivés? Voici ma tentative de le faire à la main: en supposant que , nous avons et le faire en donne (notez que l'inversibilité ne fonctionne pas dans ce cas, j'ai donc utilisé un inverse généralisé): $\mathbf{V} = \text{diag}(50, 85, 75)$

\begin{aligned} [\begin{matrix} {\hat{β}}_{0} \\ {\hat{β}}_{1} \end{matrix}] = \\ {([\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] diag (1 / 50, 1 / 85, 1 / 75) {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T})}^{- 1} {[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]}^{T} diag (1 / 50, 1 / 85, 1 / 75) [\begin{matrix} 0.25 \\ 0.75 \\ 0.85 \end{matrix}] \end{aligned}

$\begin{align}&\begin{bmatrix} \hat\beta_0 \\ \hat\beta_1 \end{bmatrix} = \\ &\left(\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T} \right)^{-1}\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}^{T}\text{diag}(1/50, 1/85, 1/75)\begin{bmatrix} 0.25 \\ 0.75 \\ 0.85 \end{bmatrix} \end{align}$ R

X <- matrix(rep(1, times = 6), byrow = T, nrow = 3, ncol = 2)
V_inv <- diag(c(1/50, 1/85, 1/75))
y <- c(0.25, 0.75, 0.85)

library(MASS)
ginv(t(X) %*% V_inv %*% X) %*% t(X) %*% V_inv %*% y

         [,1]
[1,] 0.278913
[2,] 0.278913

Ceux-ci ne correspondent pas aux valeurs de la lm()sortie. Qu'est-ce que je fais mal?

r linear-model weighted-regression

— Clarinettiste
source

La matrice doit être pas En outre, vous devriez l'être , non . $X$

[\begin{matrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{matrix}],

$\begin{bmatrix} 1 & 0\\ 1 & 1\\ 1 & 2 \end{bmatrix},$

[\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] .

$\begin{bmatrix} 1 & 1\\ 1 & 1\\ 1 & 1 \end{bmatrix}.$ V_invdiag(weights)diag(1/weights)

x <- c(0, 1, 2)
y <- c(0.25, 0.75, 0.85)
weights <- c(50, 85, 75)
X <- cbind(1, x)

> solve(t(X) %*% diag(weights) %*% X, t(X) %*% diag(weights) %*% y)
       [,1]
  0.3495122
x 0.2834146

— mark999
source

Merci d'avoir clarifié la matrice de conception incorrecte, surtout! Je suis assez rouillé sur ce matériau. Donc, comme dernière question, cela signifie-t-il que dans les hypothèses WLS?

Var (ϵ) = diag (1 / weights)

$\text{Var}(\boldsymbol\epsilon) = \text{diag}(1/\text{weights})$

— Clarinettiste

Oui, bien que les poids doivent uniquement être proportionnels à 1 / variance, pas nécessairement égaux. Par exemple, si vous utilisez weights <- c(50, 85, 75)/2dans votre exemple, vous obtenez le même résultat.

— mark999

Pour répondre à cette question de manière plus concise, la régression des moindres carrés pondérés utilisant weightsin Rfait les hypothèses suivantes: supposons que nous l'avons weights = c(w_1, w_2, ..., w_n). Soit , une matrice de conception, un vecteur de paramètres, et être un vecteur d'erreur avec la moyenne et la matrice de variance , où . Ensuite, suivant les mêmes étapes de la dérivation dans le message d'origine, nous avons $\mathbf{y} \in \mathbb{R}^n$ $\mathbf{X}$ $n \times p$ $\boldsymbol\beta\in\mathbb{R}^p$ $\boldsymbol\epsilon \in \mathbb{R}^n$ $\mathbf{0}$ $\sigma^2\mathbf{V}$ $\sigma^2 > 0$

V = diag (1 / w_{1}, 1 / w_{2}, \dots, 1 / w_{n}) .

$\mathbf{V} = \text{diag}(1/w_1, 1/w_2, \dots, 1/w_n)\text{.}$

\begin{aligned} \arg min_{β} {(y - X β)}^{T} V^{- 1} (y - X β) & = \arg min_{β} \sum_{i = 1}^{n} (1 / w_{i})^{- 1} (y_{i} - x_{i}^{T} β)^{2} \\ = \arg min_{β} \sum_{i = 1}^{n} w_{i} (y_{i} - x_{i}^{T} β)^{2} \end{aligned}

$\begin{align} \arg\min_{\boldsymbol \beta}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)^{T}\mathbf{V}^{-1}\left(\mathbf{y}-\mathbf{X}\boldsymbol\beta\right)&= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}(1/w_i)^{-1}(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \\ &= \arg\min_{\boldsymbol \beta}\sum_{i=1}^{n}w_i(y_i-\mathbf{x}^{T}_i\boldsymbol\beta)^2 \end{align}$ et est estimé en utilisant du GLS hypothèses .

β

$\boldsymbol\beta$

\hat{β} = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} y

$\hat{\boldsymbol\beta} = (\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{V}^{-1}\mathbf{y}$

— Clarinettiste
source