Régression logistique: variables de réponse de Bernoulli vs binomiales

Je souhaite effectuer une régression logistique avec la réponse binomiale suivante et avec et comme variables prédites. $X_1$ $X_2$

entrez la description de l'image ici

Je peux présenter les mêmes données que les réponses de Bernoulli dans le format suivant.

entrez la description de l'image ici

Les résultats de la régression logistique pour ces 2 ensembles de données sont essentiellement les mêmes. Les résidus de déviance et AIC sont différents. (La différence entre la déviance nulle et la déviance résiduelle est la même dans les deux cas - 0,228.)

Vous trouverez ci-dessous les résultats de régression issus de R. Les ensembles de données sont appelés binom.data et bern.data.

Voici la sortie binomiale.

Call:
glm(formula = cbind(Successes, Trials - Successes) ~ X1 + X2, 
    family = binomial, data = binom.data)

Deviance Residuals: 
[1]  0  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.9649    21.6072  -0.137    0.891
X1Yes        -0.1897     2.5290  -0.075    0.940
X2            0.3596     1.9094   0.188    0.851

(Dispersion parameter for binomial family taken to be 1)

Null deviance:  2.2846e-01  on 2  degrees of freedom
Residual deviance: -4.9328e-32  on 0  degrees of freedom
AIC: 11.473

Number of Fisher Scoring iterations: 4

Voici la sortie de Bernoulli.

Call:
glm(formula = Success ~ X1 + X2, family = binomial, data = bern.data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6651  -1.3537   0.7585   0.9281   1.0108  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.9649    21.6072  -0.137    0.891
X1Yes        -0.1897     2.5290  -0.075    0.940
X2            0.3596     1.9094   0.188    0.851

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 15.276  on 11  degrees of freedom
Residual deviance: 15.048  on  9  degrees of freedom
AIC: 21.048

Number of Fisher Scoring iterations: 4

Mes questions:

1) Je peux voir que les estimations ponctuelles et les erreurs types entre les 2 approches sont équivalentes dans ce cas particulier. Cette équivalence est-elle vraie en général?

2) Comment la réponse à la question n ° 1 peut-elle être justifiée mathématiquement?

3) Pourquoi les résidus de déviance et AIC sont-ils différents?

— Un scientifique
source

Réponses:

1) oui Vous pouvez agréger / désagréger (?) Des données binomiales d’individus ayant les mêmes covariables. Cela vient du fait que la statistique suffisante pour un modèle binomial est le nombre total d'événements pour chaque vecteur de covariable; et le Bernoulli n'est qu'un cas particulier du binôme. Intuitivement, chaque essai de Bernoulli constituant un résultat binomial est indépendant, il ne devrait donc pas y avoir de différence entre le compter comme résultat unique ou comme des essais individuels séparés.

2) Supposons que nous avons vecteurs de covariables uniques , chacun ayant un résultat binomial sur essais, c’est-à-dire Vous avez spécifié un modèle de régression logistique, de sorte que $n$ $x_1, x_2, \ldots, x_n$ $N_i$

Y_{i} \sim B i n (N_{i}, p_{i})

$Y_i \sim \mathrm{Bin}(N_i, p_i)$

l o g i t (p_{i}) = \sum_{k = 1}^{K} β_{k} x_{i k}

$\mathrm{logit}(p_i) = \sum_{k=1}^K \beta_k x_{ik}$ bien que nous verrons plus tard que ce n'est pas important.

La log-vraisemblance pour ce modèle est et nous le maximisons par rapport à(entermes de) pour obtenir nos estimations de paramètres.

ℓ (β; Y) = \sum_{i = 1}^{n} \log (\binom{N_{i}}{Y_{i}}) + Y_{i} \log (p_{i}) + (N_{i} - Y_{i}) \log (1 - p_{i})

$\ell(\beta; Y) = \sum_{i=1}^n \log {N_i \choose Y_i} + Y_i \log(p_i) + (N_i - Y_i) \log(1-p_i)$

β

$\beta$

p_{i}

$p_i$

$i = 1, \ldots, n$ $N_i$

Z_{i 1}, \dots, Z_{i Y_{i}} = 1

$Z_{i1}, \ldots, Z_{iY_i} = 1$

Z_{i (Y_{i} + 1)}, \dots, Z_{i N_{i}} = 0

$Z_{i(Y_i+1)}, \ldots, Z_{iN_i} = 0$

Y_{i}

$Y_i$

(N_{i} - Y_{i})

$(N_i - Y_i)$

Z_{i j} \sim B e r n o u l l i (p_{i})

$Z_{ij} \sim \mathrm{Bernoulli}(p_i)$

p_{i}

$p_i$

ℓ (β; Z) = \sum_{i = 1}^{n} \sum_{j = 1}^{N_{i}} Z_{i j} \log (p_{i}) + (1 - Z_{i j}) \log (1 - p_{i})

$\ell(\beta; Z) = \sum_{i=1}^n \sum_{j=1}^{N_i} Z_{ij}\log(p_i) + (1-Z_{ij})\log(1-p_i)$

Z_{i j}

$Z_{ij}$

ℓ (β; Y) = \sum_{i = 1}^{n} Y_{i} \log (p_{i}) + (N_{i} - Y_{i}) \log (1 - p_{i})

$\ell(\beta; Y) = \sum_{i=1}^n Y_i \log(p_i) + (N_i - Y_i)\log(1-p_i)$

$\beta$ $\log {N_i \choose Y_i}$ $\beta$

D_{i} = 2 [Y_{i} \log (\frac{Y_{i} / N_{i}}{{\hat{p}}_{i}}) + (N_{i} - Y_{i}) \log (\frac{1 - Y_{i} / N_{i}}{1 - {\hat{p}}_{i}})]

$D_i = 2\left[Y_i \log \left( \frac{Y_i/N_i}{\hat{p}_i} \right) + (N_i-Y_i) \log \left( \frac{1-Y_i/N_i}{1-\hat{p}_i} \right)\right]$

{\hat{p}}_{i}

$\hat{p}_i$

{\hat{p}}_{i} = Y_{i} / N_{i}

$\hat{p}_i = Y_i/N_i$

D_{i} = 0

$D_i = 0$

i

$i$

D_{i j} = 2 [Z_{i j} \log (\frac{Z_{i j}}{{\hat{p}}_{i}}) + (1 - Z_{i j}) \log (\frac{1 - Z_{i j}}{1 - {\hat{p}}_{i}})]

$D_{ij} = 2\left[Z_{ij} \log \left( \frac{Z_{ij}}{\hat{p}_i} \right) + (1-Z_{ij}) \log \left(\frac{1-Z_{ij}}{1-\hat{p}_i} \right)\right]$ Apart from the fact that you will now have

\sum_{i = 1}^{n} N_{i}

$\sum_{i=1}^n N_i$ deviance residuals (instead of

n

$n$ as with the binomial data), these will each be either

D_{i j} = - 2 \log ({\hat{p}}_{i})

$D_{ij} = -2\log(\hat{p}_i)$ or

D_{i j} = - 2 \log (1 - {\hat{p}}_{i})

$D_{ij} = -2\log(1-\hat{p}_i)$ depending on whether

Z_{i j} = 1

$Z_{ij} = 1$ or

0

$0$ , and are obviously not the same as the above. Even if you sum these over

j

$j$ to get a sum of deviance residuals for each

i

$i$ , you don't get the same:

D_{i} = \sum_{j = 1}^{N_{i}} D_{i j} = 2 [Y_{i} \log (\frac{1}{{\hat{p}}_{i}}) + (N_{i} - Y_{i}) \log (\frac{1}{1 - {\hat{p}}_{i}})]

$D_i = \sum_{j=1}^{N_i} D_{ij} = 2\left[Y_i \log \left( \frac{1}{\hat{p}_i} \right) + (N_i-Y_i) \log \left( \frac{1}{1-\hat{p}_i} \right)\right]$

The fact that the AIC is different (but the change in deviance is not) comes back to the constant term that was the difference between the log-likelihoods of the two models. When calculating the deviance, this is cancelled out because it is the same in all models based on the same data. The AIC is defined as

A I C = 2 K - 2 ℓ

$AIC = 2K - 2\ell$ and that combinatorial term is the difference between the

ℓ

$\ell$ s:

A I C_{B e r n o u l l i} - A I C_{B i n o m i a l} = 2 \sum_{i = 1}^{n} \log (\binom{N_{i}}{Y_{i}}) = 9.575

$AIC_{\mathrm{Bernoulli}} - AIC_{\mathrm{Binomial}} = 2\sum_{i=1}^n \log {N_i \choose Y_i} = 9.575$

— Mark
source

Thanks for your very detailed reply, Mark! Sorry for the delay in my response - I was on vacation. 3) Given that the 2 models give different results for deviance residuals and AIC, which one is correct or better? a) As I understand, observations with a deviance residual in excess of two may indicate lack of fit, so the absolute values of the deviance residuals matter. b) Since AIC is used to compare the fit between different models, perhaps there is no "correct" AIC. I would just compare the AICs of 2 binomial models or 2 Bernoulli models.

— A Scientist

a) For the binary data, the

D_{i j}

$D_{ij}$ will be > 2 if either (

Z_{i j} = 1

$Z_{ij} = 1$ and

{\hat{p}}_{i} < e^{- 1} = 0.368

$\hat{p}_i < e^{-1} = 0.368$ ) or (

Z_{i j} = 0

$Z_{ij} = 0$ and

{\hat{p}}_{i} > 1 - e^{- 1} = 0.632

$\hat{p}_i > 1 - e^{-1} = 0.632$ ). So even if your model fits the binomial data perfectly for the

i

$i$ th covariate vector (i.e.

Y_{i} / N_{i} = {\hat{p}}_{i} < 0.368

$Y_i / N_i = \hat{p}_i < 0.368$ , say), then the

Y_{i}

$Y_i$

Z_{i j}

$Z_{ij}$ s that you've arbitrarily allocated as being 1 will have

D_{i j} > 2

$D_{ij} > 2$ . For this reason, I think the deviance residuals make more sense with the binomial data. Furthermore, the deviance itself for binary data does not have its usual properties...

— Mark

...Link to further info about that last statement

— Mark

b) Oui, en comparant

A I C

$AIC$ s entre les modèles n’a de sens que lorsque les données utilisées pour s’ajuster à chaque modèle sont exactement les mêmes. Alors comparez Bernoulli avec Bernoulli ou binomial avec binomial.

— Marc

Thanks, Mark! Your thoughtful and detailed replies are much appreciated!

— A Scientist

I just want make comments on the last paragraph, “The fact that the AIC is different (but the change in deviance is not) comes back to the constant term that was the difference between the log-likelihoods of the two models. When calculating the change in deviance, this is cancelled out because it is the same in all models based on the same data." Unfortunately, this is not correct for the change in deviance. The deviance does not include the constant term Ex (extra constant term in the log-likelihood for the binomial data). Therefore, the change in deviance does nothing to do with the constant term EX. The deviance compares a given model to the full model. The fact that the deviances are different from Bernoulli/binary and binomial modelling but change in deviance is not is due to the difference in the full model log-likelihood values. These values are cancelled out in calculating the deviance changes. Therefore, Bernoulli and binomial logistic regression models yield an identical deviance changes provided the predicted probabilities pij and pi are the same. In fact, that is true for the probit and other link functions.

Let lBm and lBf denote the log-likelihood values from fitting model m and full model f to Bernoulli data. The deviance is then

    DB=2(lBf - lBm)=-2(lBm – lBf).

Although the lBf is zero for the binary data, we have not simplified the DB and kept it as is. The deviance from the binomial modelling with the same covariates is

    Db=2(lbf+Ex – (lbm+Ex))=2(lbf – lbm) = -2(lbm – lbf)

where the lbf+Ex and lbm+Ex are the log-likelihood values by the full and m models fitted to the binomial data. The extra constant term (Ex) is disappeared from the right hand side of the Db. Now look at change in deviances from Model 1 to Model 2. From Bernoulli modelling, we have change in deviance of

    DBC=DB2-DB1=2(lBf – lBm2)-2(lBf – lBm1) =2(lBm1 – lBm2).

Similarly, change in deviance from binomial fitting is

    DbC=DB2-DB1=2(lbf – lbm2)-2(lbf – lbm1) =2(lbm1 – lbm2).

It is immediately follows that the deviance changes are free from the log-likelihood contributions from full models, lBf and lbf. Therefore, we will get the same change in deviance, DBC = DbC, if lBm1 = lbm1 and lBm2 = lbm2. We know that is the case here and that why we are getting the same deviance changes from Bernoulli and binomial modelling. The difference between lbf and lBf leads to the different deviances.

— Saei
source

Would it be possibly for you to edit formatting of your answer? Unfortunately in this form it is not very readable. I would encourage you to brake the text in paragraphs and add

T E X

$\TeX$ formatting to the formulas. It is also not always clear what does the abbreviations you use mean.

— Tim

Many thanks, Tim. I am not familiar with the TEX formatting. I have originally typed in the Word, but I was unable to copy and paste. I have separated the equations from the text.

— Saei

I'm not sure if you misread that paragraph: I said "the AIC is different (but the change in deviance is not)", and the remainder of the paragraph explains why the AIC is different between the two models. I didn't claim that the change in deviance depended on the constant term. In fact, I said "When calculating the change in deviance, this [the constant term] is cancelled out because it is the same in all models based on the same data"

— Mark

The problem is that there is only one “constant term” in the text and it is the combinatorial term (binomial coefficient). When you say "this" is cancelled out, it implies that the constant term is included in the deviance. The difference between deviances from the Bernoulli and binomial models is the contributions from the log-likelihood value lbf from full the model. The lbf does not vary by different binomial models on the same data and it is cancelled out when calculating the change in deviance.

— Saei

Ah ok I see what you mean. I have edited my answer accordingly, leaving in the reference to the change in deviance because the asker specifically mentioned it. The change in deviance is the same because the deviance doesn't depend on the constant term.

— Mark