[Ce qui suit semble peut-être un peu technique à cause de l’utilisation d’équations, mais il s’appuie principalement sur les diagrammes en flèche pour fournir l’intuition qui nécessite seulement une compréhension très élémentaire de la méthode MLS - alors ne vous laissez pas rebuter.]
Supposons que vous souhaitiez estimer l'effet de causalité de sur y i donné par le coefficient estimé de β , mais pour une raison quelconque, il existe une corrélation entre votre variable explicative et le terme d'erreur:xiyiβ
yi=α+βxi+↖corrϵi↗
Cela est peut-être dû au fait que nous avons oublié d’inclure une variable importante qui est également corrélée à . Ce problème est connu comme biais variable, puis votre omis β ne vous donnera pas l'effet causal (voir ici pour les détails). C'est un cas où vous voudriez utiliser un instrument, car ce n'est qu'alors que vous pourrez trouver le véritable effet causal.xiβˆ
Un instrument est une nouvelle variable décorrélée avec ε i , mais qui correspond bien à x i et qui seules influences y i par x i - donc notre instrument est ce qu'on appelle « exogène ». C'est comme dans ce tableau ici:ziϵixiyixi
zi→xi↑ϵi→↗yi
xi
xitotal variation=a+πziexplained variation+ηiunexplained variation
then you know that the explained variation here is exogenous to our original equation because it depends on the exogenous variable zi only. So in this sense, we split our xi up into a part that we can claim is certainly exogenous (that's the part that depends on zi) and some unexplained part ηi that keeps all the bad variation which correlates with ϵi. Now we take the exogenous part of this regression, call it xiˆ,
xi=a+πzigood variation=xˆi+ηibad variation
and put this into our original regression:
yi=α+βxˆi+ϵi
Now since xˆi is not correlated anymore with ϵi (remember, we "filtered out" this part from xi and left it in ηi), we can consistently estimate our β because the instrument has helped us to break the correlation between the explanatory variably and the error. This was one way how you can apply instrumental variables. This method is actually called 2-stage least squares, where our regression of xi on zi is called the "first stage" and the last equation here is called the "second stage".
In terms of our original picture (I leave out the ϵi to not make a mess but remember that it is there!), instead of taking the direct but flawed route between xi to yi we took an intermediate step via xˆi
zi→xi↗→xˆi↓yi
Thanks to this slight diversion of our road to the causal effect we were able to consistently estimate β by using the instrument. The cost of this diversion is that instrumental variables models are generally less precise, meaning that they tend to have larger standard errors.
How do we find instruments?
That's not an easy question because you need to make a good case as to why your zi would not be correlated with ϵi - this cannot be tested formally because the true error is unobserved. The main challenge is therefore to come up with something that can be plausibly seen as exogenous such as natural disasters, policy changes, or sometimes you can even run a randomized experiment. The other answers had some very good examples for this so I won't repeat this part.