Quel est le lien entre les régions crédibles et les tests d’hypothèses bayésiennes?


38

Dans les statistiques fréquentistes, il existe un lien étroit entre les intervalles de confiance et les tests. Utilisation de l' inférence sur μ dans la N(μ,σ2) la distribution , par exemple, le 1α intervalle de confiance

x¯±tα/2(n1)s/n
contient toutes les valeurs deμqui ne sont pas rejetées par letesttau niveau de significationα.

Les intervalles de confiance fréquemment sont en ce sens des tests inversés. (Incidemment, cela signifie que nous pouvons interpréter la valeur p comme la plus petite valeur de α pour laquelle la valeur nulle du paramètre serait incluse dans l' intervalle de confiance 1α . Je trouve que cela peut être un moyen utile d'expliquer ce que p valeurs p sont vraiment pour les personnes qui connaissent un peu les statistiques.)

En lisant sur le fondement théorique de la décision des régions crédibles bayésiennes , j'ai commencé à me demander s'il existe un lien / une équivalence similaire entre les régions crédibles et les tests bayésiens.

  • Y a-t-il un lien général?
  • S'il n'y a pas de connexion générale, existe-t-il des exemples de connexion?
  • S'il n'y a pas de lien général, comment pouvons-nous voir cela?

Une question connexe à laquelle je m'interrogeais - quelqu'un pourrait-il m'indiquer un document qu'il considère comme "l'étalon-or" ou "l'exemple canonique" du test d'hypothèse bayésien utilisé sur un problème réel plutôt que sur un exemple simpliste. Je n'ai jamais vraiment compris le test d'hypothèse bayésien et je pense que je trouverais un bon exemple de son utilisation instructif.
Patrick Caldon

2
@ PatrickCaldon Je doute qu'il existe un "papier d'or" à ce sujet car les tests d'hypothèses bayésiennes sont formulés dans un cadre basé sur la théorie de la décision (par conséquent, il est trop large pour être capturé dans un seul document). Le livre mentionné dans la réponse de MånsT fournit un bon matériel. Les livres et exposés de Berger pourraient également présenter un intérêt.

Je crois que le document ba.stat.cmu.edu/vol03is01.php peut clarifier l'essentiel de notre discussion ici.
Carlos AB Pereira

Merci, Carlos! Le lien ne semble pas fonctionner pour le moment, mais je suppose que cela mène à votre article de 2008 sur l'analyse bayésienne avec Stern et Wechsler. J'ai trouvé cela une lecture très intéressante!
MånsT

Cher Monsieur, Bayesian Analysis est passé au projet Euclid. Le professeur Carlos est ici: projecteuclid.org/…
Zen

Réponses:


19

J'ai réussi à trouver un exemple où une connexion existe. Cela semble toutefois dépendre fortement de mon choix de fonction de perte et de l'utilisation d'hypothèses composites.

Je commence par un exemple général, qui est ensuite suivi par un cas spécial simple impliquant la distribution normale.

Exemple général

Pour un paramètre inconnu , laissez Θ être l'espace des paramètres et considérer l'hypothèse θ Θ 0 par rapport à l'alternative θ Θ 1 = Θ Θ 0 .θΘθΘ0θΘ1=ΘΘ0

Laissez une fonction de test, en utilisant la notation à Xi'an de Le choix bayésien (qui est un peu en arrière à ce que je suis au moins habitué), de sorte que nous rejetons Θ 0 si φ = 0 et accepter Θ 0 si φ = 1 . Considérons la fonction de perte L ( θ , φ ) = { 0 , si  φ = I Θ 0 ( θ ) un 0 , si  θ ΘφΘ0φ=0Θ0φ=1 Le testBayes est alorsφtc(x)

L(θ,φ)={0,if φ=IΘ0(θ)a0,if θΘ0 and φ=0a1,if θΘ1 and φ=1.
φπ(x)=1ifP(θΘ0|x)a1(a0+a1)1.

Prenez et un 1 = 1 - α . L'hypothèse nulle Θ 0 est acceptée si P ( & thetav Θ 0 | x ) 1 - α .a0=α0.5a1=1αΘ0P(θΘ0|x)1α

Maintenant, une région crédible est une région telle que P ( Θ c | x ) 1 - α . Ainsi, par définition, si Θ 0 est telle que P ( & thetav Θ 0 | x ) 1 - α , Θ c peut être une région crédible que si P ( Θ 0Θ c | x ) > 0ΘcP(Θc|x)1αΘ0P(θΘ0|x)1αΘcP(Θ0Θc|x)>0 .

On accepte l'hypothèse nulle si et seulement si chaque région -credible contient un sous - ensemble non-nul de Θ 0 .1αΘ0

Un cas particulier plus simple

Pour mieux illustrer le type de test utilisé dans l'exemple ci-dessus, considérons le cas spécial suivant.

Laissez avec θ ~ N ( 0 , 1 ) . Set Θ = R , Θ 0 = ( - , 0 ] et Θ 1 = ( 0 , ) , de sorte que l' on veut tester si & thetav 0 .xN(θ,1)θN(0,1)Θ=RΘ0=(,0]Θ1=(0,)θ0

Les calculs standard donnent Φ()est la fonctionrépartition normale standard.

P(θ0|x)=Φ(x/2),
Φ()

Soit soit tel que Φ ( z 1 - α ) = 1 - α . Θ 0 est acceptée lorsque - x / z1αΦ(z1α)=1αΘ0 .x/2>z1α

Ceci équivaut à accepter quand Pourα=0,05,Θ0est donc rejeté lorsquex>-2,33x2zα.α=0.05Θ0x>2.33 .

Si , au contraire , nous utilisons l'avant , Θ 0 est rejetée lorsque x > - 2,33 - ν .θN(ν,1)Θ0x>2.33ν

commentaires

The above loss function, where we think that falsely accepting the null hypothesis is worse than falsely rejecting it, may at first glance seem like a slightly artifical one. It can however be of considerable use in situations where "false negatives" can be costly, for instance when screening for dangerous contagious diseases or terrorists.

The condition that all credible regions must contain a part of Θ0 is actually a bit stronger than what I was hoping for: in the frequentist case the correspondence is between a single test and a single 1α confidence interval and not between a single test and all 1α intervals.


2
+1 I would use credibility region instead of credibility interval.

1
Thanks @Procrastinator! I've edited the answer and changed it to "region" while I was at it. I mostly work with HPD regions of unimodal posteriors, so I tend to think of confidence regions as intervals. :)
MånsT

12

Michael and Fraijo suggested that simply checking whether the parameter value of interested was contained in some credible region was the Bayesian equivalent of inverting confidence intervals. I was a bit skeptical about this at first, since it wasn't obvious to me that this procedure really resulted in a Bayesian test (in the usual sense).

As it turns out, it does - at least if you're willing to accept a certain type of loss functions. Many thanks to Zen, who provided references to two papers that establish a connection between HPD regions and hypothesis testing:

I'll try to summarize them here, for future reference. In analogue with the example in the original question, I'll treat the special case where the hypotheses are

H0:θΘ0={θ0}andH1:θΘ1=ΘΘ0,
where Θ is the parameter space.

Θ0Θ1

π()θ

T(x)={θ:π(θ|x)>π(θ0|x)}.

This means that T(x) is a HPD region, with credibility P(θT(x)|x).

The Pereira-Stern test rejects Θ0 when P(θT(x)|x) is "small" (<0.05, say). For a unimodal posterior, this means that θ0 is far out in the tails of the posterior, making this criterion somewhat similar to using p-values. In other words, Θ0 is rejected at the 5 % level if and only if it is not contained the in 95 % HPD region.

Let the test function φ be 1 if Θ0 is accepted and 0 if Θ0 is rejected. Madruga et al. proposed the loss function

L(θ,φ,x)={a(1I(θT(x)),if φ(x)=0b+cI(θ(T(x)),if φ(x)=1,
with a,b,c>0.

Minimization of the expected loss leads to the Pereira-Stern test where Θ0 is rejected if P(θT(x)|x)<(b+c)/(a+c).

So far, all is well. The Pereira-Stern test is equivalent to checking whether θ0 is in an HPD region and there is a loss function that generates this test, meaning that it is founded in decision theory.

The controversial part though is that the loss function depends on x. While such loss functions have appeared in the literature a few times, they don't seem to be generally accepted as being very reasonable.

For further reading on this topic, see a list of papers that cite the Madruga et al. article.


Update October 2012:

I wasn't completely satisfied with the above loss function, as its dependence on x makes the decision-making more subjective than I would like. I spent some more time thinking about this problem and ended up writing a short note about it, posted on arXiv earlier today.

Let qα(θ|x) denote the posterior quantile function of θ, such that P(θqα(θ|x))=α. Instead of HPD sets we consider the central (equal-tailed) interval (qα/2(θ|x),q1α/2(θ|x)). To test Θ0 using this interval can be justified in the decision-theoretic framework without a loss function that depends on x.

The trick is to reformulate the problem of testing the point-null hypothesis Θ0={θ0} as a three-decision problem with directional conclusions. Θ0 is then tested against both Θ1={θ:θ<θ0} and Θ1={θ:θ>θ0}.

Let the test function φ=i if we accept Θi (note that this notation is the opposite of that used above!). It turns out that under the weighted 01 loss function

L2(θ,φ)={0,if θΘi and φ=i,i{1,0,1},α/2,if θΘ0 and φ=0,1,if θΘiΘ0 and φ=i,i{1,1},
the Bayes test is to reject Θ0 if θ0 is not in the central interval.

This seems like a quite reasonable loss function to me. I discuss this loss, the Madruga-Esteves-Wechsler loss and testing using credible sets further in the manuscript on arXiv.


2
(I'm marking this as a community wiki)
MånsT

When you say "To arrive at the Pereira-Stern test we must minimize the expected posterior loss", well, actually we do that in any Bayesian decision procedure. The difference here is that the loss function depends on data (as you pointed out), which is not standard. Normaly we have L:{ParameterSpace}×{Actions}R.
Zen

@Zen: Yes, of course, I phrased that wrongly. Thanks for pointing that out. :)
MånsT

3
@MånsT: (+1) This is an interesting answer. I very much respect the fact you chose to mark this as CW in this instance, but I wish you wouldn't have. :-)
cardinal

8

I coincidentally read your arXiv paper prior to coming to this question and already wrote a blog entry on it (scheduled to appear on October, 08). To sum up, I find your construction of theoretical interest, but also think it is too contrived to be recommended, esp. as it does not seem to solve the point-null hypothesis Bayesian testing problem, which traditionally requires to put some prior mass on the point-null parameter value.

To wit, the solution you propose above (in the October update) and as Theorem 2 in your arXiv paper is not a valid test procedure in that φ takes three values, rather than the two values that correspond to accept/reject. Similarly, the loss function you use in Theorem 3 (not reproduced here) amounts to testing a one-sided hypothesis, H0:θθ0, rather than a point-null hypothesis H0:θ=θ0.

My major issue however is that it seems to me that both Theorem 3 and Theorem 4 in your arXiv paper are not valid when H0 is a point-null hypothesis, i.e. when Θ0={θ0}, with no prior mass.


1
Thanks (+1) for your comments! I very much look forward to reading your blog post. :) As you point out, Theorems 3 and 4 are concerned with composite hypotheses only. The 1α/2 in Theorem 2 is a misprint. It should read α/2, in which case φ=0 when α/2<min(P(Θ1),P(Θ1)), which happens when θ0 is in the the credible interval. I'll change this in the arXiv manuscript as soon as possible!
MånsT

You are right (+1!), I was thinking of the inequality the other way! In the arXiv document, the central inequality is written the wrong way. i.e. one should accept H0 iff
Xi'an

That's good to hear :) The updated manuscript (with Thm 2 corrected) will be on arXiv on Monday. I'll make the assumption that Θ0 is not point-null in Thm 4 explicit as well.
MånsT

1
Just make sure to clarify the proof of Theorem 2 in the arXiv document: the displayed inequality is written the wrong way. i.e. one should accept H0 iff P(θΘi|x)>α/2, not the opposite!
Xi'an

3

You can use a credible interval (or HPD region) for Bayesian hypothesis testing. I don't think it is common; though, to be fair I do not see much nor do I use formal Bayesian Hypothesis testing in practice. Bayes factors are occasionally used (and in Robert's "Bayesian Core" somewhat lauded) in hypothesis testing set up.


1
Cheers @Fraijo! Could you perhaps elaborate a bit on how your answer differ from that of Michael Chernick?
MånsT

2
I do not think the use of Bayes factors for testing hypothesis is "occasional", see for example this reference.

@MånsT in his follow up the process Michael describes seems to be a Bayes Factor test. Essentially you create two models with different priors based on your hypothesis and then compare the the probability of the data set based on those priors. The reference Procrasinator posted gives a quick review of this.
Fraijo

1
@Procrastinator I said occasional only because in my industry I see few people using Bayesian methods, let alone using Bayesian methods for testing hypothesis. Personally I use Bayes factors to check my models for sensitivity to the prior, which I suppose is a form of hypothesis testing.
Fraijo

1
@MånsT short answer: no. Setting up a credible interval and finding out if it contains the null hypothesis is the only direct test that is comparable to frequentist hypothesis testing. There are two problems with this method: 1) the obvious fact that you can find multiple regions in some cases (e.g. an HPD versus a symmetric region) and 2) testing a point hypothesis (theta = a) conflicts with the Bayesian ideal of parameters taking distributions (theta ~ P(theta)).
Fraijo

1

A credible region is just a region where the integral of the posterior density over the region is a specified probability e.g. 0.95. One way to form a Bayesian hypothesis test is to see whether or not the null hypothesized value(s) of the parameter(s) fall in the credible region. In this way we can have a similar 1-1 correspondence between hypothesis tests and credible regions just like the frequentists do with confidence intervals and hypothesis tests. But this is not the only way to do hypothesis testing.


Are this kind of ad hoc Bayesian tests often used in practice?
MånsT

1
@MansT I don't think so. I think that usually Bayesians put prior odds on the null hypothesis being true and then based on the data construct posterior odds. If the posterior odds are storngly against the null hypothesis then it is rejected. I am not the best person to ask though since I do not do Bayesian inference very often.
Michael R. Chernick

2
The test described by Michael is credited to Lindley by Zellner in his book on Bayesian econometrics.
Zen

1
Yes, these kind of tests are certainly sprung from Bayesian ideas, but I'm not sure if they have a solid foundation in Bayesian decision theory. In the latter setting I would expect tests to be derived from a loss function, typically involving a test function.
MånsT


-1

Let me give it how I got it reading Tim's answer.

It is based on the table views with hypothesis (estimated parameter) in columns and observations in the rows.

enter image description here

In the first table, you have col probabilities sum to 1, i.e. they are conditional probabilities, whose condition, getting into the column event is supplied in the bottom row, called 'prior'. In the last table, rows similarly sum to 1 and in the middle you have joint probabilities, i.e. conditional probabilities you find in the first and last table times the probability of the condition, the priors.

The tables basically perform the Bayesian transform: in the first table, you give p.d.f of the observations (rows) in every column, set the prior for this hypothesis (yes, hypothesis column is a pdf of observations under that hypothesis), you do that for every column and table takes it first into the joint probabilites table and, then into the probabilities of your hypothesis, conditioned by observations.

As I have got from Tim's answer (correct me if I am wrong), the Critical Interval approach looks at the first table. That is, once experiment is complete, we know the row of the table (either heads or tails in my example but you may make more complex experiments, like 100 coin flips and get a table with 2^100 rows). Frequentialist scans through its columns, which, as I have said, is a distribution of possible outcomes under condition that hypothesis colds true (e.g. coin is fair in my example), and rejects those hypothesis (columns) that has give very low probability value at the observed row.

Bayesianist first adjust the probabilities, converting cols into rows and looks at table 3, finds the row of the observed outcome. Since it is also a p.d.f, he goes through the experiment outcome row and picks the highest-prob hypethesis until his 95% credibility pocket is full. The rest of hypothesis is rejected.

How do you like it? I am still in the process of learning and graphic seems helpful to me. I belive that I am on the right track since a reputable user gives the same picture, when analyzes the difference of two approaches. I have proposed a graphical view of the mechanics of hypothesis selection.

I encourage everybody to read that Keith last answer but my picture of hypothesis test mechanics can immediately say that frequentist does not look at the other hypothesis when verifies the current one whereas consideration of high credibile hypothesis highly impacts the reception/rejection of other hypotheses in bayesian analisys because if you have a single hypothesis which occurs 95% of times under observed data, you throw all other hypothesis immediately, regardless how well is data fit within them. Let's put the statistical power analysis, which contrast two hypotheses based on their confidence intervals overlap, aside.

But, I seem have spotted the similarity between two approaches: they seem to be connected through P(A | B) > P(A) <=> P(B|A) > P(B) property. Basically, if there is a dependence between A and B then it will show up as correlation in both freq and bayesian tables. So, doing one hypothesis test correlates with the other, they sorta must give the same results. Studying the roots of the correlation, will likely give you the connection between the two. In my question there I actually ask why is the difference instead of absolute correlation?

En utilisant notre site, vous reconnaissez avoir lu et compris notre politique liée aux cookies et notre politique de confidentialité.
Licensed under cc by-sa 3.0 with attribution required.