Construction de la distribution Dirichlet avec distribution Gamma

Soit $X_1,\dots,X_{k+1}$ des variables aléatoires mutuellement indépendantes, ayant chacune une distribution gamma avec les paramètres $\alpha_i,i=1,2,\dots,k+1$ montrent que $Y_i=\frac{X_i}{X_1+\cdots+X_{k+1}},i=1,\dots,k$ , ont une distribution commune comme $\text{Dirichlet}(\alpha_1,\alpha_2,\dots,\alpha_k;\alpha_{k+1})$

PDF commun de $(X_1,\dots,X_{k+1})=\frac{e^{-\sum_{i=1}^{k+1}x_i}x_1^{\alpha_1-1}\dots x_{k+1}^{\alpha_{k+1}-1}}{\Gamma(\alpha_1)\Gamma(\alpha_2)\dots \Gamma(\alpha_{k+1})}$ .Puis pour trouver le pdf commun de $(Y_1,\dots,Y_{k+1})$ je ne peux pas trouver jacobien ie $J(\frac{x_1,\dots,x_{k+1}}{y_1,\dots,y_{k+1}})$

— Argha
source

Have a look at pages 13-14 of this document.

@Procrastinator Thank you very much your document is best answer for my question.

— Argha

@Procrastinator - perhaps you should put this as an answer, since the OP is happy with it, and add a couple of sentences so you don't trip the "we want more than one-sentence answer" warning?

— jbowman

That document now is a non-answer because it's a 404.

— whuber

Wayback machine to the rescue: pdf

— mobeets

Jacobians--the absolute determinants of the change of variable function--appear formidable and can be complicated. Nevertheless, they are an essential and unavoidable part of the calculation of a multivariate change of variable. It would seem there's nothing for it but to write down a $k+1$ by $k+1$ matrix of derivatives and do the calculation.

There's a better way. It's shown at the end in the "Solution" section. Because the purpose of this post is to introduce statisticians to what may be a new method for many, much of it is devoted to explaining the machinery behind the solution. This is the algebra of differential forms. (Differential forms are the things that one integrates in multiple dimensions.) A detailed, worked example is included to help make this become more familiar.

Background

Il y a plus d'un siècle, les mathématiciens ont développé la théorie de l' algèbre différentielle pour travailler avec les «dérivées d'ordre supérieur» qui se produisent dans la géométrie multidimensionnelle. Le déterminant est un cas particulier des objets de base manipulés par de telles algèbres, qui sont généralement des formes multilinéaires alternées . La beauté de cela réside dans la simplicité des calculs.

Voici tout ce que vous devez savoir.

Un différentiel est une expression de la forme " ". C'est la concaténation de " " avec n'importe quel nom de variable. $dx_i$ $d$
A one-form is a linear combination of differentials, such as $dx_1+dx_2$ or even $x_2 dx_1 - \exp(x_2) dx_2$ . That is, the coefficients are functions of the variables.
Forms can be "multiplied" using a wedge product, written $\wedge$ . This product is anti-commutative (also called alternating): for any two one-forms $\omega$ and $\eta$ ,

$ω \land η = - η \land ω .$ $\omega \wedge \eta = -\eta \wedge \omega.$
This multiplication is linear and associative: in other words, it works in the familiar fashion. An immediate consequence is that $\omega \wedge \omega = -\omega \wedge \omega$ , implying the square of any one-form is always zero. That makes multiplication extremely easy!
For the purposes of manipulating the integrands that appear in probability calculations, an expression like $dx_1 dx_2 \cdots dx_{k+1}$ can be understood as $|dx_1\wedge dx_2 \wedge \cdots \wedge dx_{k+1}|$ .
When $y = g(x_1, \ldots, x_n)$ is a function, then its differential is given by differentiation:

$d y = d g (x_{1}, \dots, x_{n}) = \frac{\partial g}{\partial x_{1}} (x_{1}, \dots, x_{n}) d x_{1} + \dots + \frac{\partial g}{\partial x_{1}} (x_{1}, \dots, x_{n}) d x_{n} .$ $dy = dg(x_1, \ldots, x_n) = \frac{\partial g}{\partial x_1}(x_1, \ldots, x_n) dx_1 + \cdots + \frac{\partial g}{\partial x_1}(x_1, \ldots, x_n) dx_n.$

The connection with Jacobians is this: the Jacobian of a transformation $(y_1, \ldots, y_n) = F(x_1, \ldots, x_n) = (f_1(x_1, \ldots, x_n), \ldots, f_n(x_1, \ldots, x_n))$ is, up to sign, simply the coefficient of $dx_1\wedge \dots \wedge dx_n$ that appears in computing

d y_{1} \land \dots \land d y_{n} = d f_{1} (x_{1}, \dots, x_{n}) \land \dots \land d f_{n} (x_{1}, \dots, x_{n})

$dy_1 \wedge \cdots \wedge dy_n = df_1(x_1,\ldots, x_n)\wedge \cdots \wedge df_n(x_1, \ldots, x_n)$

after expanding each of the $df_i$ as a linear combination of the $dx_j$ in rule (5).

Example

$(x, y)$ $(r,\theta)$ , where $(x,y) = (r\cos(\theta), r\sin(\theta))$ . The following is an utterly mechanical application of the preceding rules, where " $(*)$ " is used to abbreviate expressions that will obviously disappear by virtue of rule (3), which implies $dr\wedge dr = d\theta\wedge d\theta = 0$ .

\begin{aligned} d x d y & = | d x \land d y | = | d (r \cos (θ)) \land d (r \sin (θ)) | \\ = | (\cos (θ) d r - r \sin (θ) d θ) \land (\sin (θ) d r + r \cos (θ) d θ | \\ = | (*) d r \land d r + (*) d θ \land d θ - r \sin (θ) d θ \land \sin (θ) d r + \cos (θ) d r \land r \cos (θ) d θ | \\ = | 0 + 0 + r \sin^{2} (θ) d r \land d θ + r \cos^{2} (θ) d r \land d θ | \\ = | r (\sin^{2} (θ) + \cos^{2} (θ)) d r \land d θ) | \\ = r d r d θ \end{aligned} .

$\eqalign{ dx dy &= |dx\wedge dy| = |d(r\cos(\theta)) \wedge d(r\sin(\theta))| \\ &= |(\cos(\theta)dr - r\sin(\theta)d\theta) \wedge (\sin(\theta)dr + r\cos(\theta)d\theta| \\ &= |(*)dr\wedge dr + (*) d\theta\wedge d\theta - r\sin(\theta)d\theta\wedge \sin(\theta)dr + \cos(\theta)dr \wedge r\cos(\theta) d\theta| \\ &= |0 + 0 + r\sin^2(\theta) dr\wedge d\theta + r\cos^2(\theta) dr\wedge d\theta| \\ &= |r(\sin^2(\theta) + \cos^2(\theta)) dr\wedge d\theta)| \\ &= r\ dr d\theta }.$

The point of this is the ease with which such calculations can be performed, without messing about with matrices, determinants, or other such multi-indicial objects. You just multiply things out, remembering that wedges are anti-commutative. It's easier than what is taught in high school algebra.

Preliminaries

Let's see this differential algebra in action. In this problem, the PDF of the joint distribution of $(X_1, X_2, \ldots, X_{k+1})$ is the product of the individual PDFs (because the $X_i$ are assumed to be independent). In order to handle the change to the variables $Y_i$ we must be explicit about the differential elements that will be integrated. These form the term $dx_1 dx_2 \cdots dx_{k+1}$ . Including the PDF gives the probability element

\begin{aligned} f_{X} (x, α) d x_{1} \dots d x_{k + 1} & \propto (x_{1}^{α_{1} - 1} \exp (- x_{1})) \dots (x_{k + 1}^{α_{k + 1} - 1} \exp (- x_{k + 1})) d x_{1} \dots d x_{k + 1} \\ = x_{1}^{α_{1} - 1} \dots x_{k + 1}^{α_{k + 1} - 1} \exp (- (x_{1} + \dots + x_{k + 1})) d x_{1} \dots d x_{k + 1} . \end{aligned}

$\eqalign{ f_\mathbf{X}(\mathbf{x},\mathbf{\alpha})dx_1 \cdots dx_{k+1} &\propto \left(x_1^{\alpha_1-1}\exp\left(-x_1\right)\right)\cdots \left(x_{k+1}^{\alpha_{k+1}-1}\exp\left(-x_{k+1}\right) \right)dx_1 \cdots dx_{k+1} \\ &= x_1^{\alpha_1-1}\cdots x_{k+1}^{\alpha_{k+1}-1}\exp\left(-\left(x_1+\cdots+x_{k+1}\right)\right)dx_1 \cdots dx_{k+1}. }$

(The normalizing constant has been ignored; it will be recovered at the end.)

Staring at the definitions of the $Y_i$ a few seconds ought to reveal the utility of introducing the new variable

Z = X_{1} + X_{2} + \dots + X_{k + 1},

$Z = X_1 + X_2 + \cdots + X_{k+1},$

giving the relationships

X_{i} = Y_{i} Z .

$X_i = Y_i Z.$

This suggests making the change of variables $x_i \to y_i z$ in the probability element. The intention is to retain the first $k$ variables $y_1, \ldots, y_k$ along with $z$ and then integrate out $z$ . To do so, we have to re-express all the $dx_i$ in terms of the new variables. This is the heart of the problem. It's where the differential algebra takes place. To begin with,

d x_{i} = d (y_{i} z) = y_{i} d z + z d y_{i} .

$dx_i = d(y_i z) = y_i dz + z dy_i.$

Note that since $Y_1+Y_2+\cdots+Y_{k+1}=1$ , then

0 = d (1) = d (y_{1} + y_{2} + \dots + y_{k + 1}) = d y_{1} + d y_{2} + \dots + d y_{k + 1} .

$0 = d(1) = d(y_1 + y_2 + \cdots + y_{k+1}) = dy_1 + dy_2 + \cdots + dy_{k+1}.$

Consider the one-form

ω = d x_{1} + \dots + d x_{k} = z (d y_{1} + \dots + d y_{k}) + (y_{1} + \dots + y_{k}) d z .

$\omega = dx_1 + \cdots + dx_k = z(dy_1 + \cdots + dy_k) + (y_1+\cdots + y_k) dz.$

It appears in the differential of the last variable:

\begin{aligned} d x_{k + 1} & = z d y_{k + 1} + y_{k + 1} d z \\ = - z (d y_{1} + \dots + d y_{k}) + (1 - y_{1} - \dots y_{k}) d z \\ = d z - ω . \end{aligned}

$\eqalign{ dx_{k+1} &= z dy_{k+1} + y_{k+1}dz \\ &= -z(dy_1 + \cdots + dy_k) + (1-y_1-\cdots y_k)dz \\ &= dz - \omega. }$

The value of this lies in the observation that

d x_{1} \land \dots \land d x_{k} \land ω = 0

$dx_1 \wedge \cdots \wedge dx_k \wedge \omega = 0$

because, when you expand this product, there is one term containing $dx_1 \wedge dx_1 = 0$ as a factor, another containing $dx_2 \wedge dx_2 = 0$ , and so on: they all disappear. Consequently,

\begin{aligned} d x_{1} \land \dots \land d x_{k} \land d x_{k + 1} & = d x_{1} \land \dots \land d x_{k} \land z - d x_{1} \land \dots \land d x_{k} \land ω \\ = d x_{1} \land \dots \land d x_{k} \land z . \end{aligned}

$\eqalign{ dx_1 \wedge \cdots \wedge dx_k \wedge dx_{k+1} &= dx_1 \wedge \cdots \wedge dx_k \wedge z - dx_1 \wedge \cdots \wedge dx_k \wedge \omega \\ &= dx_1 \wedge \cdots \wedge dx_k \wedge z. }$

Whence (because all products $dz\wedge dz$ disappear),

\begin{aligned} d x_{1} \land \dots \land d x_{k + 1} & = (z d y_{1} + y_{1} d z) \land \dots \land (z d y_{k} + y_{k} d z) \land d z \\ = z^{k} d y_{1} \land \dots \land d y_{k} \land d z . \end{aligned}

$\eqalign{ dx_1 \wedge \cdots \wedge dx_{k+1} &= (z dy_1 + y_1 dz) \wedge \cdots \wedge (z dy_k + y_k dz) \wedge dz \\ &= z^k dy_1 \wedge \cdots \wedge dy_k \wedge dz. }$

The Jacobian is simply $|z^k| = z^k$ , the coefficient of the differential product on the right hand side.

Solution

The transformation $(x_1, \ldots, x_k, x_{k+1})\to (y_1, \ldots, y_k, z)$ is one-to-one: its inverse is given by $x_i = y_i z$ for $1\le i\le k$ and $x_{k+1} = z(1-y_1-\cdots-y_k)$ . Therefore we don't have to fuss any more about the new probability element; it simply is

\begin{aligned} (z y_{1})^{α_{1} - 1} \dots (z y_{k})^{α_{k} - 1} {(z (1 - y_{1} - \dots - y_{k}))}^{α_{k + 1} - 1} \exp (- z) | z^{k} d y_{1} \land \dots \land d y_{k} \land d z | \\ = (z^{α_{1} + \dots + α_{k + 1} - 1} \exp (- z) d z) (y_{1}^{α_{1} - 1} \dots y_{k}^{α_{k} - 1} {(1 - y_{1} - \dots - y_{k})}^{α_{k + 1} - 1} d y_{1} \dots d y_{k}) . \end{aligned}

$\eqalign{ &(z y_1)^{\alpha_1-1}\cdots (z y_k)^{\alpha_k-1}\left(z(1-y_1-\cdots-y_k)\right)^{\alpha_{k+1}-1}\exp\left(-z\right)|z^k dy_1 \wedge \cdots \wedge dy_k \wedge dz| \\ &= \left(z^{\alpha_1+\cdots+\alpha_{k+1}-1}\exp\left(-z\right) dz\right)\left( y_1^{\alpha_1-1} \cdots y_k^{\alpha_k-1}\left(1-y_1-\cdots-y_k\right)^{\alpha_{k+1}-1}dy_1 \cdots dy_k\right). }$

That is manifestly a product of a Gamma $(\alpha_1+\cdots+\alpha_{k+1})$ distribution (for $Z$ ) and a Dirichlet $(\mathbf\alpha)$ distribution (for $(Y_1,\ldots, Y_k)$ ). In fact, since the original normalizing constant must have been a product of $\Gamma(\alpha_i)$ , we deduce immediately that the new normalizing constant must be divided by $\Gamma(\alpha_1+\cdots+\alpha_{k+1})$ , enabling the PDF to be written

f_{Y} (y, α) = \frac{Γ (α_{1} + \dots + α_{k + 1})}{Γ (α_{1}) \dots Γ (α_{k + 1})} (y_{1}^{α_{1} - 1} \dots y_{k}^{α_{k} - 1} {(1 - y_{1} - \dots - y_{k})}^{α_{k + 1} - 1}) .

$f_\mathbf{Y}(\mathbf{y},\mathbf{\alpha}) = \frac{\Gamma(\alpha_1+\cdots+\alpha_{k+1})}{\Gamma(\alpha_1)\cdots\Gamma(\alpha_{k+1})}\left( y_1^{\alpha_1-1} \cdots y_k^{\alpha_k-1}\left(1-y_1-\cdots-y_k\right)^{\alpha_{k+1}-1}\right).$

— whuber
source