Qu'est-ce qu'une variable instrumentale?

36

Les variables instrumentales sont de plus en plus courantes en économie appliquée et en statistique. Pour les non-initiés, pouvons-nous avoir des réponses non techniques aux questions suivantes:

Qu'est-ce qu'une variable instrumentale?
Quand voudrait-on employer une variable instrumentale?
Comment trouver ou choisir une variable instrumentale?

regression econometrics instrumental-variables

— Graham Cookson
source

4

Ne pensez-vous pas que l'article de Wikipedia à ce sujet est suffisant?

1

Des questions telles que celle-ci nécessitent une réponse de type wiki / blog. Je pense que les questions ne devraient pas exiger des réponses aussi longues.

Je ne suis pas sûr que la bonne chose à faire est de simplement ignorer cette question et de renvoyer le demandeur au wiki - en particulier pendant la version bêta où nous essayons de construire le contenu du site. Le demandeur de la question devrait peut-être soumettre chacune de ces questions individuellement afin de pouvoir mieux les aborder.

— russellpierce

3

@mbq - l'exemple de Wikipédia ne peut guère être qualifié de non technique. Cela dépend beaucoup du jargon et des équations.

— rolando2

1

Il est devenu courant en économie à un moment donné dans les années 1980. Certains biostaticiens en ont également entendu parler et l'appliquent dans le contexte de modèles d'erreur de mesure, où les instruments sont étroitement considérés comme des mesures supplémentaires disponibles. Ils sont considérés comme des instruments dans le contexte économétrique plus large: ils sont corrélés à la variable d’intérêt et ils ne sont pas corrélés à son erreur de mesure.

— StasK

41

[Ce qui suit semble peut-être un peu technique à cause de l’utilisation d’équations, mais il s’appuie principalement sur les diagrammes en flèche pour fournir l’intuition qui nécessite seulement une compréhension très élémentaire de la méthode MLS - alors ne vous laissez pas rebuter.]

Supposons que vous souhaitiez estimer l'effet de causalité de sur donné par le coefficient estimé de , mais pour une raison quelconque, il existe une corrélation entre votre variable explicative et le terme d'erreur: $x_i$ $y_i$ $\beta$

\begin{matrix} y_{i} & = & α & + & β x_{i} & + & ϵ_{i} \\ ↖ & ↗ \\ c o r r \end{matrix}

$\begin{matrix}y_i &=& \alpha &+& \beta x_i &+& \epsilon_i & \\ & && & & \hspace{-1cm}\nwarrow & \hspace{-0.8cm} \nearrow \\ & & & & & corr & \end{matrix}$

Cela est peut-être dû au fait que nous avons oublié d’inclure une variable importante qui est également corrélée à . Ce problème est connu comme biais variable, puis votre omis ne vous donnera pas l'effet causal (voir ici pour les détails). C'est un cas où vous voudriez utiliser un instrument, car ce n'est qu'alors que vous pourrez trouver le véritable effet causal. $x_i$ $\widehat{\beta}$

Un instrument est une nouvelle variable décorrélée avec , mais qui correspond bien à et qui seules influences par - donc notre instrument est ce qu'on appelle « exogène ». C'est comme dans ce tableau ici: $z_i$ $\epsilon_i$ $x_i$ $y_i$ $x_i$

\begin{matrix} z_{i} & \to & x_{i} & \to & y_{i} \\ ↑ & ↗ \\ ϵ_{i} \end{matrix}

$\begin{matrix} z_i & \rightarrow & x_i & \rightarrow & y_i \newline & & \uparrow & \nearrow & \newline & & \epsilon_i & \end{matrix}$

$x_i$

\underset{total variation}{\underset{⏟}{x_{i}}} = \underset{explained variation}{\underset{⏟}{a + π z_{i}}} + \underset{unexplained variation}{\underset{⏟}{η_{i}}}

$\underbrace{x_i}_{\text{total variation}} = \underbrace{a \quad + \quad \pi z_i}_{\text{explained variation}} \quad + \underbrace{\eta_i}_{\text{unexplained variation}}$

then you know that the explained variation here is exogenous to our original equation because it depends on the exogenous variable $z_i$ only. So in this sense, we split our $x_i$ up into a part that we can claim is certainly exogenous (that's the part that depends on $z_i$ ) and some unexplained part $\eta_i$ that keeps all the bad variation which correlates with $\epsilon_i$ . Now we take the exogenous part of this regression, call it $\widehat{x_i}$ ,

x_{i} = \underset{good variation = {\hat{x}}_{i}}{\underset{⏟}{a + π z_{i}}} + \underset{bad variation}{\underset{⏟}{η_{i}}}

$x_i \quad = \underbrace{a \quad + \quad \pi z_i}_{\text{good variation} \: = \: \widehat{x}_i } \quad + \underbrace{\eta_i}_{\text{bad variation}}$

and put this into our original regression:

y_{i} = α + β {\hat{x}}_{i} + ϵ_{i}

$y_i = \alpha + \beta \widehat{x}_i + \epsilon_i$

Now since $\widehat{x}_i$ is not correlated anymore with $\epsilon_i$ (remember, we "filtered out" this part from $x_i$ and left it in $\eta_i$ ), we can consistently estimate our $\beta$ because the instrument has helped us to break the correlation between the explanatory variably and the error. This was one way how you can apply instrumental variables. This method is actually called 2-stage least squares, where our regression of $x_i$ on $z_i$ is called the "first stage" and the last equation here is called the "second stage".

In terms of our original picture (I leave out the $\epsilon_i$ to not make a mess but remember that it is there!), instead of taking the direct but flawed route between $x_i$ to $y_i$ we took an intermediate step via $\widehat{x}_i$

\begin{matrix} {\hat{x}}_{i} \\ ↗ & ↓ \\ z_{i} & \to & x_{i} & \to & y_{i} \end{matrix}

$\begin{matrix} & & & & & \widehat{x}_i \newline & & & & \nearrow & \downarrow \newline & z_i & \rightarrow & x_i & \rightarrow & y_i \end{matrix}$

Thanks to this slight diversion of our road to the causal effect we were able to consistently estimate $\beta$ by using the instrument. The cost of this diversion is that instrumental variables models are generally less precise, meaning that they tend to have larger standard errors.

How do we find instruments?
That's not an easy question because you need to make a good case as to why your $z_i$ would not be correlated with $\epsilon_i$ - this cannot be tested formally because the true error is unobserved. The main challenge is therefore to come up with something that can be plausibly seen as exogenous such as natural disasters, policy changes, or sometimes you can even run a randomized experiment. The other answers had some very good examples for this so I won't repeat this part.

— Andy
source

10

+1 I am grateful finally to read a detailed answer instead of a list of references or links.

— whuber

1

Excellent! I explain this to my students more "mnemonically" as:

x

$x$ is poisoned/tainted by unobserved factors in

ϵ

$\epsilon$ . The first-stage regression "cleans"/sucks out the venom from

x

$x$ . We can use the "cleaned" version of

x

$x$ to find the causal coefficient,

β

$\beta$ .

— MichaelChirico

Is there an intuitive argument why the 2SLS estimate for

β

$\beta$ is consistent? When we calculate

{\hat{x}}_{i}

$\widehat{x}_i$ , we are "filtering out" the part of

x_{i}

$x_i$ that is correlated with the error, but why should it be that the filtering out doesn't change

x_{i}

$x_i$ in a way that changes our estimate for

β

$\beta$ ?

— user35734

See here: stats.stackexchange.com/questions/64279/… or you may want to ask a new question. Hope this helps.

— Andy

@user35734 it's not consistent but asymptotically consistent.

— Vim

17

As a medical statistician with no previous knowledge of econom(etr)ics, I struggled to get to grips with instrumental variables as I often struggled to follow their examples and didn't understand their rather different terminology (e.g. 'endogeneity', 'reduced form', 'structural equation', 'omitted variables'). Here's a few references I found useful (the first should be freely available, but I'm afraid the others probably require a subscription):

Staiger D. Instrumental Variables. AcademyHealth Cyber Seminar in Health Services Research Methods, March 2002. http://www.dartmouth.edu/~dstaiger/wpapers-Econ.htm
Newhouse JP, McClellan M. Econometrics in Outcomes Research: The Use of Instrumental Variables. Annual Review of Public Health 1998;19:17-34. http://dx.doi.org/10.1146/annurev.publhealth.19.1.17
Greenland S. An introduction to instrumental variables for epidemiologists. International Journal of Epidemiology 2000;29:722-729. http://dx.doi.org/10.1093/ije/29.4.722
Zohoori N, Savitz DA. Econometric approaches to epidemiologic data: Relating endogeneity and unobserved heterogeneity to confounding. Annals of Epidemiology 1997;7:251-257. http://dx.doi.org/10.1016/S1047-2797(97)00023-9

I'd also recommend chapter 4 of:

Angrist JD, Pischke JS. Mostly harmless econometrics: an empiricist's companion. Princeton, N.J: Princeton University Press, 2009. http://www.mostlyharmlesseconometrics.com/

— onestop
source

11

Here are some slides that I prepared for an econometrics course at UC Berkeley. I hope that you find them useful---I believe that they answer your questions and provide some examples.

There are also more advanced treatments on the course pages for PS 236 and PS 239 (graduate-level political science methods courses) at my website: http://gibbons.bio/teaching.html.

Charlie

— Charlie
source

Link to Berkeley slides is no longer valid.

— rolando2

7

Non-technical (usually that's all I'm good for anyway): There are times when not only does X cause Y, but Y causes X as well. An instrumental variable is a device that can "clean up" this messy, inconvenient relationship so that the best estimates can be made of X's effect on Y.

The instrumental variable is chosen by virtue of its relationships: it is a cause of X, but, other than acting through X, it has no effect on Y. The instrument (or instruments) is used in Stage One to compute a new "version" of X, one that is in no way a function of Y. This new "predicted" X is then used in a second stage, in a more standard regression, to explain/predict Y. Hence the term Two-Stage Least Squares regression.

One typically finds the IV in processes that are overriding or beyond the control of X OR Y, such as variables that depend on laws, policies, acts of nature, etc.

— rolando2
source