Quelqu'un a-t-il résolu l'exercice 4.1 de PTLOS?


19

C'est un exercice donné dans Probability Theory: The Logic of Science par Edwin Jaynes, 2003. Il y a une solution partielle ici . J'ai élaboré une solution partielle plus générale et je me demandais si quelqu'un d'autre l'avait résolue. J'attendrai un peu avant de poster ma réponse, pour essayer les autres.

Bon, supposons donc que nous avons n hypothèse mutuellement exclusive et exhaustive, notée Hi(i=1,,n) . Supposons en outre que nous ayonsm ensembles de données, notésDj(j=1,,m) . Le rapport de vraisemblance pour la ième hypothèse est donné par:

LR(Hi)=P(D1D2,Dm|Hi)P(D1D2,Dm|H¯i)

Notez que ce sont des probabilités conditionnelles. Supposons maintenant que, étant donné la ième hypothèse Hi les m ensembles de données sont indépendants, nous avons donc:

P(D1D2,Dm|Hi)=j=1mP(Dj|Hi)(i=1,,n)Condition 1

Maintenant, il serait très pratique que le dénominateur prenne également en compte cette situation, de sorte que nous ayons:

P(D1D2,Dm|H¯i)=j=1mP(Dj|H¯i)(i=1,,n)Condition 2

Car dans ce cas, le rapport de vraisemblance se divisera en un produit de facteurs plus petits pour chaque ensemble de données, de sorte que nous avons:

LR(Hi)=j=1mP(Dj|Hi)P(Dj|H¯i)

Ainsi, dans ce cas, chaque ensemble de données "votera pour " ou "votera contre H i " indépendamment de tout autre ensemble de données.HiHi

L'exercice consiste à prouver que si (plus de deux hypothèses), il n'y a pas une telle manière non triviale dans laquelle cette factorisation peut se produire. Autrement dit, si vous supposez que la condition 1 et la condition 2 sont remplies, alors au plus l'un des facteurs: P ( D 1 | H i )n>2 est différent de 1, et donc un seul ensemble de données contribuera au rapport de vraisemblance.

P(D1|Hi)P(D1|H¯i)P(D2|Hi)P(D2|H¯i)P(Dm|Hi)P(Dm|H¯i)

Personnellement, j'ai trouvé ce résultat assez fascinant, car il montre essentiellement que les tests d'hypothèses multiples ne sont rien d'autre qu'une série de tests d'hypothèses binaires.


Je suis un peu confus par l'index sur ; est ˉ H i = arg max h H i P ( D 1 , D m | h ) ? Ou est-ce ˉ H i = arg max h { H 1 , , H n } P ( D 1 , D m | h )H¯iH¯i=argmaxhHiP(D1,Dm|h)H¯i=argmaxh{H1,,Hn}P(D1,Dm|h)? Il semble que ce devrait être le dernier, mais je ne sais pas pourquoi l'indice. Ou peut-être que je manque complètement autre chose :)
JMS

@JMS - signifie l'énoncé logique " H i est faux", ou que l'une des autres hypothèses est vraie. Donc, dans "l'algèbre booléenne", nous avons ¯ H iH 1 + H 2 + + H i - 1 + H i + 1 + + H n (parce que l'hypothèse est exclusive et exhaustive)H¯iHiH¯iH1+H2++Hi1+Hi+1++Hn
probabilités

J'ai l'impression qu'il doit y avoir une solution plus intuitive que l'algèbre donnée dans la solution partielle de Sanders. Si les données sont indépendantes compte tenu de chacune des hypothèses, cela continue de s'appliquer lorsque les antérieurs de l'hypothèse varient. Et en quelque sorte, le résultat est que la même chose doit s'appliquer pour la conclusion ...
charles.y.zheng

@charles - Je sais exactement ce que vous ressentez. Je pensais pouvoir le dériver en utilisant une incohérence qualitative (Reductio ad absurdum), mais je ne pouvais pas le faire. Je pourrais étendre les calculs de Sander. Et c'est la condition 2 qui est "la douteuse" en termes de ce que signifie le résultat.
probabilityislogic

@probabilityislogic "cela montre essentiellement que les tests d'hypothèses multiples ne sont rien d'autre qu'une série de tests d'hypothèses binaires." Pourriez-vous développer cette phrase? En lisant la page 98 du livre de Jaynes, je comprends que vous pouvez réduire les tests de à tester H 1 les uns contre les autres, puis normaliser d'une manière ou d'une autre pour obtenir le postérieur de H 1 , mais je ne comprends pas pourquoi cela découlerait des résultats de l'exercice 4.1. H1,,HnH1H1
Martin Drozdik

Réponses:


7

La raison pour laquelle nous avons accepté l'eq. 4.28 (dans le livre, votre condition 1) était que nous supposions que la probabilité des données étant donné une certaine hypothèse et que les informations de base X soient indépendantes, en d'autres termes pour tout D i et D j avec i j :HaXDiDjij

extensibilité au-delà du cas binaire peut donc être discutée comme ceci: si nous supposons que l'équation 1 est vraie, est-ce que l'équation 2 est également vraie?

P(Di|DjHaX)=P(Di|HaX)(1)

Voyons d'abord le côté gauche de l'éq.2, en utilisant la règle de multiplication:

P(Di|DjHa¯X)=?P(Di|Ha¯X)(2)

P(Di|DjHa¯X)=P(DiDjHa¯|X)P(DjHa¯|X)(3)
n{H1Hn}
Ha¯=baHb
P(Di|DjHa¯X)=baP(Di|DjHbX)P(DjHb|X)baP(DjHb|X)=baP(Di|HbX)P(DjHb|X)baP(DjHb|X)
For the case that we have only two hypotheses, the summations are removed (since there is only one ba), the equal terms in the nominator and denominator, P(DjHb|X), cancel out and eq.2 is proved correct, since Hb=Ha¯. Therefore equation 4.29 can be derived from equation 4.28 in the book. But when we have more than two hypotheses, this doesn't happen, for example, if we have three hypotheses: {H1,H2,H3}, the equation above becomes:
P(Di|DjH1¯X)=P(Di|H2X)P(DjH2|X)+P(Di|H3X)P(DjH3|X)P(DjH2|X)+P(DjH3|X)
In other words:
P(Di|DjH1¯X)=P(Di|H2X)1+P(DjH3|X)P(DjH2|X)+P(Di|H3X)1+P(DjH2|X)P(DjH3|X)
The only way this equation can yield eq.2 is that both denominators equal 1, i.e. both fractions in the denominators must equal zero. But that is impossible.

1
I think the fourth equation is incorrect. We should have P(DiDjHb|X)=P(DiHB|X)P(Dj|HbX)
probabilityislogic

Thank you very much probabilityislogic, I was able to correct the solution. What do you think now?
astroboy

I just don't understand how Jaynes says: "Those who fail to distinguish between logical independence and causal independence would suppose that (4.29) is always valid".
astroboy

I think I found the answer to my last comment: right after the sentence above Jaynes says: "provided only that no Di exerts a physical influence on any other Dj". So essentially Jaynes is saying that even if they don't have physical influence, there is a logical limitation that doesn't allow the generalization to more than two hypotheses.
astroboy

After reading the text again I feel my last comment was not a good answer. As I understand it now, Jayne's wanted to say: "Those who fail to distinguish between logical independence and causal independence" would argue that Di and Dj are assumed to have no physical influence. Thus they have causal independence which for them implies logical independence over any set of hypotheses. So they find all this discussion meaningless and simply proceed to generalize the binary case.
astroboy

1

Okay, so rather than go and re-derive Saunder's equation (5), I will just state it here. Condition 1 and 2 imply the following equality:

j=1m(kihkdjk)=(kihk)m1(kihkj=1mdjk)
where
djk=P(Dj|Hk,I)hk=P(Hk|I)

Now we can specialise to the case m=2 (two data sets) by taking D1(1)D1 and relabeling D2(1)D2D3Dm. Note that these two data sets still satisfy conditions 1 and 2, so the result above applies to them as well. Now expanding in the case m=2 we get:

(kihkd1k)(lihld2l)=(kihk)(lihld1ld2l)

kilihkhld1kd2l=kilihkhld1ld2l

kilihkhld2l(d1kd1l)=0(i=1,,n)

The term (d1ad1b) occurs twice in the above double summation, once when k=a and l=b, and once again when k=b and l=a. This will occur as long as a,bi. The coefficient of each term is given by d2b and d2a. Now because there are i of these equations, we can actually remove i from these equations. To illustrate, take i=1, now this means we have all conditions except where a=1,b=2 and b=1,a=2. Now take i=3, and we now can have these two conditions (note this assumes at least three hypothesis). So the equation can be re-written as:

l>khkhl(d2ld2k)(d1kd1l)=0

Now each of the hi terms must be greater than zero, for otherwise we are dealing with n1<n hypothesis, and the answer can be reformulated in terms of n1. So these can be removed from the above set of conditions:

l>k(d2ld2k)(d1kd1l)=0

Thus, there are n(n1)2 conditions that must be satisfied, and each conditions implies one of two "sub-conditions": that djk=djl for either j=1 or j=2 (but not necessarily both). Now we have a set of all of the unique pairs (k,l) for djk=djl. If we were to take n1 of these pairs for one of the j, then we would have all the numbers 1,,n in the set, and dj1=dj2==dj,n1=dj,n. This is because the first pair has 2 elements, and each additional pair brings at least one additional element to the set*

But note that because there are n(n1)2 conditions, we must choose at least the smallest integer greater than or equal to 12×n(n1)2=n(n1)4 for one of the j=1 or j=2. If n>4 then the number of terms chosen is greater than n1. If n=4 or n=3 then we must choose exactly n1 terms. This implies that dj1=dj2==dj,n1=dj,n. Only with two hypothesis (n=2) is where this does not occur. But from the last equation in Saunder's article this equality condition implies:

P(Dj|H¯i)=kidjkhkkihk=djikihkkihk=dji=P(Dj|Hi)

Thus, in the likelihood ratio we have:

P(D1(1)|Hi)P(D1(1)|H¯i)=P(D1|Hi)P(D1|H¯i)=1 ORP(D2(1)|Hi)P(D2(1)|H¯i)=P(D2D3,Dm|Hi)P(D2D3,Dm|H¯i)=1

To complete the proof, note that if the second condition holds, the result is already proved, and only one ratio can be different from 1. If the first condition holds, then we can repeat the above analysis by relabeling D1(2)D2 and D2(2)D3,Dm. Then we would have D1,D2 not contributing, or D2 being the only contributor. We would then have a third relabeling when D1D2 not contributing holds, and so on. Thus, only one data set can contribute to the likelihood ratio when condition 1 and condition 2 hold, and there are more than two hypothesis.

*NOTE: An additional pair might bring no new terms, but this would be offset by a pair which brought 2 new terms. e.g. take dj1=dj2 as first[+2], dj1=dj3 [+1] and dj2=dj3 [+0], but next term must have djk=djl for both k,l(1,2,3). This will add two terms [+2]. If n=4 then we don't need to choose any more, but for the "other" j we must choose the 3 pairs which are not (1,2),(2,3),(1,3). These are (1,4),(2,4),(3,4) and thus the equality holds, because all numbers (1,2,3,4) are in the set.


I am beginning to doubt the accuracy of this proof. The result in Saunders maths implies only n non linear constraints on the djk. This makes djk only have n degrees of freedom instead of 2n. However to get to the n(n1)2 conditions a different argument is required.
probabilityislogic

0

For the record, here is a somewhat more extensive proof. It also contains some background information. Maybe this is helpful for others studying the topic.

The main idea of the proof is to show that Jaynes' conditions 1 and 2 imply that

P(Dmk|HiX)=P(Dmk|X),
for all but one data set mk=1,,m. It then shows that for all these data sets, we also have
P(Dmk|H¯iX)=P(Dmk|X).
Thus we have for all but one data set,
P(Dmk|HiX)P(Dmk|H¯iX)=P(Dmk|X)P(Dmk|X)=1.
The reason that I wanted to include the proof here is that some of the steps involved are not at all obvious, and one needs to take care not to use anything else than conditions 1 and 2 and the product rule (as many of the other proofs implicitly do). The link above includes all these steps in detail. It is on my Google Drive and I will make sure it stays accessible.


Welcome to Cross Validated. Thank you for your answer. Can you please edit you answer to expand it, in order to include the main points of the link you provide? It will be more helpful both for people searching in this site and in case the link breaks. By the way, take the opportunity to take the Tour, if you haven't done it already. See also some tips on How to Answer, on formatting help and on writing down equations using LaTeX / MathJax.
Ertxiem - reinstate Monica

Thanks for your comment. I edited the post and sketched the main steps of the proof.
dennis
En utilisant notre site, vous reconnaissez avoir lu et compris notre politique liée aux cookies et notre politique de confidentialité.
Licensed under cc by-sa 3.0 with attribution required.