Effet de la réponse de commutation et de la variable explicative dans la régression linéaire simple


48

Disons qu'il existe une "vraie" relation entre et telle que , où et sont des constantes et est un bruit normal. Lorsque je génère de manière aléatoire des données à partir de ce code R: puis que je rentre dans un modèle , je reçois évidemment des estimations raisonnablement bonnes pour et .yxy=ax+b+ϵabϵx <- 1:100; y <- ax + b + rnorm(length(x))y ~ xab

(x ~ y)Cependant, si je modifie le rôle des variables comme dans , puis que je réécris le résultat pour que soit fonction de , la pente résultante est toujours plus raide (plus négative ou plus positive) que celle estimée par la régression. J'essaie de comprendre exactement pourquoi et j'apprécierais que quelqu'un me donne une intuition sur ce qui se passe là-bas.yxy ~ x


1
Ce n'est pas vrai en général. Peut-être que vous voyez juste cela dans vos données. Collez ce code: y = rnorm (10); x = rnorm (10); lm (y ~ x); lm (x ~ y); plusieurs fois dans R et vous verrez que cela va dans les deux sens.
Macro

C'est un peu différent de ce que je décrivais. Dans votre exemple, y n'était pas du tout une fonction de x, il n'y a donc pas vraiment de "pente" (le "a" dans mon exemple).
Greg Aponte

lm (y ~ x) correspond au modèle par la méthode des moindres carrés (équivalent à l'estimation de ML lorsque les erreurs sont normales). Il y a une pente. y=β0+β1x+ε
Macro

2
Votre question est posée et répondue (en quelque sorte) à stats.stackexchange.com/questions/13126 et à stats.stackexchange.com/questions/18434 . Cependant, je pense que personne n’a encore fourni une explication simple et claire des relations entre (une) régression de contre , (b) une régression de contre , (c) une analyse de la corrélation de et de , (d) régression des erreurs dans les variables de et , et (e) ajustant une distribution normale bivariée à . Ce serait un bon endroit pour une telle exposition :-). YXXYXYXY(X,Y)
whuber

2
Bien sûr, Macro est correct: parce que x et y jouent un rôle équivalent dans la question, la pente la plus extrême est une question de hasard. Cependant, la géométrie suggère (à tort) que lorsque nous inversons x et y dans la régression, nous devrions obtenir le recipocal de la pente d'origine. Cela ne se produit jamais que lorsque x et y sont linéairement dépendants. Cette question peut être interprétée comme demandant pourquoi.
whuber

Réponses:


23

Soit points de données , dans le plan, traçons une droite . Si nous prédisons comme la valeur de , alors l' erreur est , l' erreur au carré est , et l' erreur quadratique totale . Nous demandonsn(xi,yi),i=1,2,ny=ax+baxi+by^iyi(yiy^i)=(yiaxib)(yiaxib)2 i=1n(yiaxib)2

Quel choix de et minimise ?abS=i=1n(yiaxib)2

Puisque est la distance verticale de partir de la droite, nous demandons la ligne telle que la somme des carrés des distances verticales des points à partir de la droite soit aussi petite que possible. Maintenant, est une fonction quadratique de et et atteint sa valeur minimale lorsque et sont tels que À partir de la deuxième équation, nous obtenons où (yiaxib)(xi,yi)Sabab

Sa=2i=1n(yiaxib)(xi)=0Sb=2i=1n(yiaxib)(1)=0
b=1ni=1n(yiaxi)=μyaμx
μy=1ni=1nyi, μx=1ni=1nxi are the arithmetic average values of the yi's and the xi's respectively. Substituting into the first equation, we get
a=(1ni=1nxiyi)μxμy(1ni=1nxi2)μx2.
Thus, the line that minimizes S can be expressed as
y=ax+b=μy+((1ni=1nxiyi)μxμy(1ni=1nxi2)μx2)(xμx),
and the minimum value of S is
Smin=[(1ni=1nyi2)μy2][(1ni=1nxi2)μx2][(1ni=1nxiyi)μxμy]2(1ni=1nxi2)μx2.

If we interchange the roles of x and y, draw a line x=a^y+b^, and ask for the values of a^ and b^ that minimize

T=i=1n(xia^yib^)2,
that is, we want the line such that the sum of the squares of the horizontal distances of the points from the line is as small as possible, then we get

x=a^y+b^=μx+((1ni=1nxiyi)μxμy(1ni=1nyi2)μy2)(yμy)
and the minimum value of T is
Tmin=[(1ni=1nyi2)μy2][(1ni=1nxi2)μx2][(1ni=1nxiyi)μxμy]2(1ni=1nyi2)μy2.

Note that both lines pass through the point (μx,μy) but the slopes are

a=(1ni=1nxiyi)μxμy(1ni=1nxi2)μx2,  a^1=(1ni=1nyi2)μy2(1ni=1nxiyi)μxμy
are different in general. Indeed, as @whuber points out in a comment, the slopes are the same when all the points (xi,yi) lie on the same straight line. To see this, note that
a^1a=Smin(1ni=1nxiyi)μxμy=0Smin=0yi=axi+b,i=1,2,,n.

Thanks! abs(correlation) < 1 accounts for why the slope was systematically steeper in the reversed case.
Greg Aponte

(+1) but I added an answer with just an illustration of what you just said, as I have a geometric mind :)
Elvis

Class reply (+1)
Digio

39

Just to illustrate Dilip’s answer: on the following pictures,

  • the black dots are data points ;
  • on the left, the black line is the regression line obtained by y ~ x, which minimize the squares of the length of the red segments;
  • on the right, the black line is the regression line obtained by x ~ y, which minimize the squares of the length of the red segments.

regression lines

Edit (least rectangles regression)

If there is no natural way to chose a "response" and a "covariate", but rather the two variables are interdependent you may wish to conserve a symmetrical role for y and x; in this case you can use "least rectangles regression."

  • write Y=aX+b+ϵ, as usual;
  • denote y^i=axi+b and x^i=1a(yib) the estimations of Yi conditional to X=xi and of Xi conditional to Y=yi;
  • minimize i|xix^i||yiy^i|, which leads to
    y^=sign(cov(x,y))σ^yσ^x(xx¯)+y¯.

Here is an illustration with the same data points, for each point, a "rectangle" is computed as the product of the length of two red segments, and the sum of rectangles is minimized. I don’t know much about the properties of this regression and I don’t find much with google.

least rectangles


14
Some notes: (1) Unless I am mistaken, it seems that the "least rectangles regression" is equivalent to the solution obtained from taking the first principal component on the matrix X=(y,x) after centering and rescaling to have unit variance and then backsubstituting. (cont.)
cardinal

14
(cont.) (2) Viewed this way, it is easy to see that this "least rectangles regression" is equivalent to a form of orthogonal (or total) least squares and, thus, (3) A special case of Deming regression on the centered, rescaled vectors taking δ=1. Orthogonal least squares can be considered as "least-circles regression".
cardinal

2
@cardinal Very interesting comments! (+1) I believe major axis (minimizing perpendicular distances between reg. line and all the points, à la PCA) or reduced major axis regression, or type II regression as exemplified in the lmodel2 R package by P Legendre, are also relevant here since those techniques are used when it's hard to tell what role (response or predictor) plays each variable or when we want to account for measurement errors.
chl

1
@chl: (+1) Yes, I believe you are right and the Wikipedia page on total least squares lists several other names for the same procedure, not all of which I am familiar with. It appears to go back to at least R. Frisch, Statistical confluence analysis by means of complete regression systems, Universitetets Økonomiske Instituut, 1934 where it was called diagonal regression.
cardinal

3
@cardinal I should have been more careful when reading the Wikipedia entry... For future reference, here is a picture taken from Biostatistical Design and Analysis Using R, by M. Logan (Wiley, 2010; Fig. 8.4, p. 174), which summarizes the different approaches, much like Elvis's nice illustrations.
chl

13

Just a brief note on why you see the slope smaller for one regression. Both slopes depend on three numbers: standard deviations of x and y (sx and sy), and correlation between x and y (r). The regression with y as response has slope rsysx and the regression with x as response has slope rsxsy, hence the ratio of the first slope to the reciprocal of the second is equal to r21.

So the greater the proportion of variance explained, the closer the slopes obtained from each case. Note that the proportion of variance explained is symmetric and equal to the squared correlation in simple linear regression.


1

A simple way to look at this is to note that, if for the true model y=α+βx+ϵ, you run two regressions:

  • y=ayx+byxx
  • x=axy+bxyy

Then we have, using byx=cov(x,y)var(x)=cov(x,y)var(y)var(y)var(x):

byx=bxyvar(y)var(x)

So whether you get a steeper slope or not just depends on the ratio var(y)var(x). This ratio is equal to, based on the assumed true model:

var(y)var(x)=β2var(x)+var(ϵ)var(x)

Link with other answers

You can connect this result with the answers from others, who said that when R2=1, it should be the reciprocal. Indeed, R2=1var(ϵ)=0, and also, byx=β (no estimation error), Hence:

R2=1byx=bxyβ2var(x)+0var(x)=bxyβ2

So bxy=1/β


0

It becomes interesting when there is also noise on your inputs (which we could argue is always the case, no command or observation is ever perfect).

I have built some simulations to observe the phenomenon, based on a simple linear relationship x=y, with Gaussian noise on both x and y. I generated the observations as follows (python code):

x = np.linspace(0, 1, n)
y = x

x_o = x + np.random.normal(0, 0.2, n)
y_o = y + np.random.normal(0, 0.2, n)

See the different results (odr here is orthogonal distance regression, i.e. the same as least rectangles regression):

enter image description here

All the code is in there:

https://gist.github.com/jclevesque/5273ad9077d9ea93994f6d96c20b0ddd


0

Regression line is not (always) the same as true relationship

You may have some 'true' causal relationship like

y=a+bx+ϵ

but fitted regression lines y ~ x or x ~ y do not mean the same as that causal relationship (even when in practice the expression for one of the regression line may coincide with the expression for the causal 'true' relationship)


More precise relationship between slopes

For two switched simple linear regressions:

Y=a1+b1XX=a2+b2Y

you can relate the slopes as following:

b1=ρ21b21b2

So the slopes are not each other inverse.


Intuition

The reason is that

  • Regression lines and correlations do not necessarily correspond one-to-one to a causal relationship.
  • Regression lines relate more directly to a conditional probability or best prediction.

You can imagine that the conditional probability relates to the strength of the relationship. Regression lines reflect this and the slopes of the lines may be both shallow when the strength of the relationship is small or both steep when the strength of the relationship is strong. The slopes are not simply each others inverse.

Example

If two variables X and Y relate to each other by some (causal) linear relationship

Y=a little bit of X+ a lot of error
Then you can imagine that it would not be good to entirely reverse that relationship in case you wish to express X based on a given value of Y.

Instead of

X=a lot of Y+ a little of error

it would be better to also use

X=a little bit of Y+ a lot of error

See the following example distributions with their respective regression lines. The distributions are multivariate normal with Σ11Σ22=1 and Σ12=Σ21=ρ

example

The conditional expected values (what you would get in a linear regression) are

E(Y|X)=ρXE(X|Y)=ρY

and in this case with X,Y a multivariate normal distribution, then the marginal distributions are

YN(ρX,1ρ2)XN(ρY,1ρ2)

So you can see the variable Y as being a part ρX and a part noise with variance 1ρ2. The same is true the other way around.

The larger the correlation coefficient ρ, the closer the two lines will be. But the lower the correlation, the less strong the relationship, the less steep the lines will be (this is true for both lines Y ~ X and X ~ Y)


0

The short answer

The goal of a simple linear regression is to come up with the best predictions of the y variable, given values of the x variable. This is a different goal than trying to come up with the best prediction of the x variable, given values of the y variable.

Simple linear regression of y ~ x gives you the 'best' possible model for predicting y given x. Hence, if you fit a model for x ~ y and algebraically inverted it, that model could at its very best do only as well as the model for y ~ x. But inverting a model fit for x ~ y will usually do worse at predicting y given x, compared to the 'optimal' y ~ x model, because the "inverted x ~ y model" was created to fulfill a different objective.

Illustration

Imagine you have the following dataset:

enter image description here

When you run an OLS regression of y ~ x, you come up with the following model

y = 0.167 + 1.5*x

This optimizes predictions of y by making the following predictions, which have associated errors:

enter image description here

The OLS regression's predictions are optimal in the sense that the sum of the values in the rightmost column (i.e. the sum of squares) is as small as can be.

When you run an OLS regression of x ~ y, you come up with a different model:

x = -0.07 + 0.64*y

This optimizes predictions of x by making the following predictions, with associated errors.

enter image description here

Again, this is optimal in the sense that the sum of the values of the rightmost column are as small as possible (equal to 0.071).

Now, imagine you tried to just invert the first model, y = 0.167 + 1.5*x, using algebra, giving you the model x = -0.11 + 0.67*x.

This would give you the following predictions and associated errors:

enter image description here

The sum of the values in the rightmost column is 0.074, which is larger than the corresponding sum from the model you get from regressing x on y, i.e. the x ~ y model. In other words, the "inverted y ~ x model" is doing a worse job at predicting x than the OLS model of x ~ y.

En utilisant notre site, vous reconnaissez avoir lu et compris notre politique liée aux cookies et notre politique de confidentialité.
Licensed under cc by-sa 3.0 with attribution required.