Comment fonctionne l'approximation du point de selle?

Comment fonctionne l' approximation du point de selle? A quel genre de problème s'agit-il?
(N'hésitez pas à utiliser un exemple particulier ou des exemples à titre d'illustration)

Y a-t-il des inconvénients, des difficultés, des points à surveiller ou des pièges pour les imprudents?

— Glen_b -Reinstate Monica
source

L’approximation du point-à-cheval d’une fonction de densité de probabilité (elle fonctionne également pour les fonctions de masse, mais je ne parlerai ici qu’en termes de densités) est une approximation étonnamment efficace qui peut être vue comme un raffinement du théorème de la limite centrale. Donc, cela ne fonctionnera que dans les environnements où il existe un théorème de la limite centrale, mais des hypothèses plus fortes sont nécessaires.

Nous partons de l’hypothèse que la fonction génératrice de moments existe et est différenciable deux fois. Cela implique notamment que tous les moments existent. Soit $X$ une variable aléatoire avec la fonction de génération de moment (de mgf)

M (t) = E e^{t X}

$\DeclareMathOperator{\E}{\mathbb{E}} M(t) = \E e^{t X}$ et CGF (fonction génératrice des cumulants)

K (t) = \log M (t)

$K(t)=\log M(t)$ (où

\log

$\log$ désigne le logarithme naturel). Dans le développement, je suivrai de près Ronald W Butler: "Saddlepoint Approximations with Applications" (CUP). Nous allons développer l'approximation du point de selle en utilisant l'approximation de Laplace à une certaine intégrale. Ecrire

e^{K (t)} = \int_{- \infty}^{\infty} e^{t X} F (X) ré X = \int_{- \infty}^{\infty} \exp (t X + bûche F (X)) ré X = \int_{- \infty}^{\infty} \exp (- h (t, X)) ré X

$e^{K(t)} = \int_{-\infty}^\infty e^{t x} f(x) \; dx =\int_{-\infty}^\infty \exp(tx+\log f(x) ) \; dx \\ = \int_{-\infty}^\infty \exp(-h(t,x)) \; dx$ où

h (t, x) = - t x - \log f (x)

$h(t,x) = -tx - \log f(x)$ . Nous allons maintenant développer

h (t, x)

$h(t,x)$ dans

x

$x$ considérant

t

$t$ comme une constante. Cela donne

h (t, X) = h (t, X_{0}) + h^{'} (t, X_{0}) (X - X_{0}) + \frac{1}{2} h^{"} (t, X_{0}) (X - X_{0})^{2} + \dots

$h(t,x)=h(t,x_0) + h'(t,x_0)(x-x_0) +\frac12 h''(t,x_0) (x-x_0)^2 +\dotsm$ où

^{'}

$'$ Désigne la différenciation par rapport à

x

$x$ . Notez que

h^{'} (t, x) = - t - \frac{\partial}{\partial X} bûche F (X) h^{"} (t, X) = - \frac{\partial^{2}}{\partial X^{2}} bûche F (X) > 0

$h'(t,x)=-t-\frac{\partial}{\partial x}\log f(x) \\ h''(t,x)= -\frac{\partial^2}{\partial x^2} \log f(x) > 0$ (la dernière inégalité par hypothèse, car elle est nécessaire au bon fonctionnement de l'approximation). Soit

x_{t}

$x_t$ être la solution à

h^{'} (t, x_{t}) = 0

$h'(t,x_t)=0$ . Nous supposerons que cela donne un minimum pour

h (t, x)

$h(t,x)$ en fonction de

x

$x$ . Grâcecette extension dans l'intégrale etoubli de la

\dots

$\dotsm$ partie, donne

e^{K (t)} \approx \int_{- \infty}^{\infty} \exp (- h (t, x_{t}) - \frac{1}{2} h^{″} (t, x_{t}) (x - x_{t})^{2}) d x = e^{- h (t, x_{t})} \int_{- \infty}^{\infty} e^{- \frac{1}{2} h^{″} (t, x_{t}) (x - x_{t})^{2}} d x

$e^{K(t)} \approx \int_{-\infty}^\infty \exp(-h(t,x_t)-\frac12 h''(t,x_t) (x-x_t)^2 ) \; dx \\ = e^{-h(t,x_t)} \int_{-\infty}^\infty e^{-\frac12 h''(t,x_t) (x-x_t)^2} \; dx$ which is a Gaussian integral, giving

e^{K (t)} \approx e^{- h (t, x_{t})} \sqrt{\frac{2 π}{h^{″} (t, x_{t})}} .

$e^{K(t)} \approx e^{-h(t,x_t)} \sqrt{\frac{2\pi}{h''(t,x_t)}}.$ This gives (a first version) of the saddlepoint approximation as

\begin{matrix} (*) & f (x_{t}) \approx \sqrt{\frac{h^{″} (t, x_{t})}{2 π}} \exp (K (t) - t x_{t}) \end{matrix}

$f(x_t) \approx \sqrt{\frac{h''(t,x_t)}{2\pi}} \exp(K(t) -t x_t) \\ \tag{*} \label{*}$ Note that the approximation has the form of an exponential family.

Now we need to do some work to get this in a more useful form.

From $h'(t,x_t)=0$ we get

t = - \frac{\partial}{\partial x_{t}} \log f (x_{t}) .

$t = -\frac{\partial}{\partial x_t} \log f(x_t).$ Differentiating this with respect to

x_{t}

$x_t$ gives

\frac{\partial t}{\partial x_{t}} = - \frac{\partial^{2}}{\partial x_{t}^{2}} \log f (x_{t}) > 0

$\frac{\partial t}{\partial x_t} = -\frac{\partial^2}{\partial x_t^2} \log f(x_t) > 0$ (by our assumptions), so the relationship between

t

$t$ and

x_{t}

$x_t$ is monotone, so

x_{t}

$x_t$ is well defined. We need an approximation to

\frac{\partial}{\partial x_{t}} \log f (x_{t})

$\frac{\partial}{\partial x_t} \log f(x_t)$ . To that end, we get by solving from

(*)

$\eqref{*}$

\begin{matrix} (**) & bûche F (X_{t}) = K (t) - t X_{t} - \frac{1}{2} bûche \frac{2 π}{- \frac{\partial^{2}}{\partial X_{t}^{2}} bûche F (X_{t})} . \end{matrix}

$\log f(x_t) = K(t) -t x_t -\frac12 \log \frac{2\pi}{-\frac{\partial^2}{\partial x_t^2} \log f(x_t)}. \tag{**} \label{**}$ Assuming the last term above only depends weakly on

x_{t}

$x_t$ , so its derivative with respect to

x_{t}

$x_t$ is approximately zero (we will come back to comment on this), we get

\frac{\partial \log f (x_{t})}{\partial x_{t}} \approx (K^{'} (t) - x_{t}) \frac{\partial t}{\partial x_{t}} - t

$\frac{\partial \log f(x_t)}{\partial x_t} \approx (K'(t)-x_t) \frac{\partial t}{\partial x_t} - t$

0 \approx t + \frac{\partial \log f (x_{t})}{\partial x_{t}} = (K^{'} (t) - x_{t}) \frac{\partial t}{\partial x_{t}}

$0 \approx t + \frac{\partial \log f(x_t)}{\partial x_t} = (K'(t)-x_t) \frac{\partial t}{\partial x_t}$

t

$t$

x_{t}

$x_t$

\begin{matrix} (§) & K^{'} (t) - x_{t} = 0, \end{matrix}

$K'(t) - x_t=0, \\ \tag{§} \label{§}$ which is called the saddlepoint equation.

$\eqref{*}$

h^{″} (t, x_{t}) = - \frac{\partial^{2} \log f (x_{t})}{\partial x_{t}^{2}} = - \frac{\partial}{\partial x_{t}} (\frac{\partial l o g f (x_{t})}{\partial x_{t}}) = - \frac{\partial}{\partial x_{t}} (- t) = (\frac{\partial x_{t}}{\partial t})^{- 1}

$h''(t,x_t) = -\frac{\partial^2 \log f(x_t)}{\partial x_t^2} \\ = -\frac{\partial}{\partial x_t} (\frac{\partial log f(x_t)}{\partial x_t} ) \\ = -\frac{\partial}{\partial x_t}(-t)= (\frac{\partial x_t}{\partial t})^{-1}$ and that we can find by implicit differentiation of the saddlepoint equation

K^{'} (t) = x_{t}

$K'(t)=x_t$ :

\frac{\partial x_{t}}{\partial t} = K^{″} (t) .

$\frac{\partial x_t}{\partial t} = K''(t).$ The result is that (up to our approximation)

h^{″} (t, x_{t}) = \frac{1}{K^{″} (t)}

$h''(t,x_t) = \frac1{K''(t)}$ Putting everything together, we have the final saddlepoint approximation of the density

f (x)

$f(x)$ as

f (x_{t}) \approx e^{K (t) - t x_{t}} \sqrt{\frac{1}{2 π K^{″} (t)}} .

$f(x_t) \approx e^{K(t)- t x_t} \sqrt{\frac1{2\pi K''(t)}}.$ Now, to use this practically, to approximate the density at a specific point

x_{t}

$x_t$ , we solve the saddlepoint equation for that

x_{t}

$x_t$ to find

t

$t$ .

The saddlepoint approximation is often stated as an approximation to the density of the mean based on $n$ iid observations $X_1, X_2, \dotsc, X_n$ . The cumulant generating function of the mean is simply $n K(t)$ , so the saddlepoint approximation for the mean becomes

f ({\bar{x}}_{t}) = e^{n K (t) - n t {\bar{x}}_{t}} \sqrt{\frac{n}{2 π K^{″} (t)}}

$f(\bar{x}_t) = e^{nK(t) - n t \bar{x}_t} \sqrt{\frac{n}{2\pi K''(t)}}$

Let us look at a first example. What does we get if we try to approximate the standard normal density

f (x) = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} x^{2}}

$f(x)=\frac1{\sqrt{2\pi}} e^{-\frac12 x^2}$ The mgf is

M (t) = \exp (\frac{1}{2} t^{2})

$M(t)=\exp(\frac12 t^2)$ so

K (t) = \frac{1}{2} t^{2} K^{'} (t) = t K^{″} (t) = 1

$K(t)=\frac12 t^2 \\ K'(t)=t \\ K''(t)=1$ so the saddlepoint equation is

t = x_{t}

$t=x_t$ and the saddlepoint approximation gives

f (x_{t}) \approx e^{\frac{1}{2} t^{2} - t x_{t}} \sqrt{\frac{1}{2 π \cdot 1}} = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} x_{t}^{2}}

$f(x_t) \approx e^{\frac12 t^2 -t x_t} \sqrt{\frac1{2\pi \cdot 1}} = \frac1{\sqrt{2\pi}} e^{-\frac12 x_t^2}$ so in this case the approximation is exact.

Let us look at a very different application: Bootstrap in the transform domain, we can do bootstrapping analytically using the saddlepoint approximation to the bootstrap distribution of the mean!

Assume we have $X_1, X_2, \dotsc, X_n$ iid distributed from some density $f$ (in the simulated example we will use a unit exponential distribution). From the sample we calculate the empirical moment generating function

\hat{M} (t) = \frac{1}{n} \sum_{i = 1}^{n} e^{t x_{i}}

$\hat{M}(t)= \frac1{n} \sum_{i=1}^n e^{t x_i}$ and then the empirical cgf

\hat{K} (t) = \log \hat{M} (t)

$\hat{K}(t) = \log \hat{M}(t)$ . We need the empirical mgf for the mean which is

\log (\hat{M} (t / n)^{n})

$\log ( \hat{M}(t/n)^n )$ and the empirical cgf for the mean

{\hat{K}}_{\bar{X}} (t) = n \log \hat{M} (t / n)

$\hat{K}_{\bar{X}}(t) = n \log \hat{M}(t/n)$ which we use to construct a saddlepoint approximation. In the following some R code (R version 3.2.3):

set.seed(1234)
x  <-  rexp(10)

require(Deriv)   ### From CRAN
drule[["sexpmean"]]   <-  alist(t=sexpmean1(t))  # adding diff rules to 
                                                 # Deriv
drule[["sexpmean1"]]  <-  alist(t=sexpmean2(t))

###

make_ecgf_mean  <-   function(x)   {
    n  <-  length(x)
    sexpmean  <-  function(t) mean(exp(t*x))
    sexpmean1 <-  function(t) mean(x*exp(t*x))
    sexpmean2 <-  function(t) mean(x*x*exp(t*x))
    emgf  <-  function(t) sexpmean(t)
    ecgf  <-   function(t)  n * log( emgf(t/n) )
    ecgf1 <-   Deriv(ecgf)
    ecgf2 <-   Deriv(ecgf1)
    return( list(ecgf=Vectorize(ecgf),
                 ecgf1=Vectorize(ecgf1),
                 ecgf2 =Vectorize(ecgf2) )    )
}

### Now we need a function solving the saddlepoint equation and constructing
### the approximation:
###

make_spa <-  function(cumgenfun_list) {
    K  <- cumgenfun_list[[1]]
    K1 <- cumgenfun_list[[2]]
    K2 <- cumgenfun_list[[3]]
    # local function for solving the speq:
    solve_speq  <-  function(x) {
          # Returns saddle point!
          uniroot(function(s) K1(s)-x,lower=-100,
                  upper = 100, 
                  extendInt = "yes")$root
}
    # Function finding fhat for one specific x:
    fhat0  <- function(x) {
        # Solve saddlepoint equation:
        s  <-  solve_speq(x)
        # Calculating saddlepoint density value:
        (1/sqrt(2*pi*K2(s)))*exp(K(s)-s*x)
    }
    # Returning a vectorized version:
    return(Vectorize(fhat0))
} #end make_fhat

( I have tried to write this as general code which can be modified easily for other cgfs, but the code is still not very robust ...)

Then we use this for a sample of ten independent observations from a unit exponential distribution. We do the usual nonparametric bootstrapping "by hand", plot the resulting bootstrap histogram for the mean, and overplot the saddlepoint approximation:

> ECGF  <- make_ecgf_mean(x)
> fhat  <-  make_spa(ECGF)
> fhat
function (x) 
{
    args <- lapply(as.list(match.call())[-1L], eval, parent.frame())
    names <- if (is.null(names(args))) 
        character(length(args))
    else names(args)
    dovec <- names %in% vectorize.args
    do.call("mapply", c(FUN = FUN, args[dovec], MoreArgs = list(args[!dovec]), 
        SIMPLIFY = SIMPLIFY, USE.NAMES = USE.NAMES))
}
<environment: 0x4e5a598>
> boots  <-  replicate(10000, mean(sample(x, length(x), replace=TRUE)), simplify=TRUE)
> boots  <-  replicate(10000, mean(sample(x, length(x), replace=TRUE)), simplify=TRUE)
> hist(boots, prob=TRUE)
> plot(fhat, from=0.001, to=2, col="red", add=TRUE)

Giving the resulting plot:

The approximation seems to be rather good!

We could get an even better approximation by integrating the saddlepoint approximation and rescaling:

> integrate(fhat, lower=0.1, upper=2)
1.026476 with absolute error < 9.7e-07

Now the cumulative distribution function based on this approximation could be found by numerical integration, but it is also possible to make a direct saddlepoint approximation for that. But that is for another post, this is long enough.

Finally, some comments left out of the development above. In $\eqref{**}$ we did an approximation essentially ignoring the third term. Why can we do that? One observation is that for the normal density function, the left-out term contributes nothing, so that approximation is exact. So, since the saddlepoint-approximation is a refinement on the central limit theorem, so we are somewhat close to the normal, so this should work well. One can also look at specific examples. Looking at the saddlepoint approximation to the Poisson distribution, looking at that left-out third term, in this case that becomes a trigamma function, which indeed is rather flat when the argument is not to close to zero.

Finally, why the name? The name come from an alternative derivation, using complex-analysis techniques. Later we can look into that, but in another post!

— kjetil b halvorsen
source

What you have so far is great. The development there is very clear.

— Glen_b -Reinstate Monica

kjetil I attempted to fix four small typos 1. "In the development I wil follow" 2. "needed for the approximatrion to work" 3. "What we misses now" 4. "implicit differentiation of the sadlepoint" but in doing so it looks like I broke one of your equations - I have no idea how, since I changed nothing but those text items (as you can see from the edit history). I'd roll it back but since I can't explain how fixing those errors caused a problem I don't want to cause still further problems. My apologies. (It actually looked like it broke as soon as I opened the edit session)

— Glen_b -Reinstate Monica

It's possible there's a mathJax bug or a bug in the edit code that leads to this issue.

— Glen_b -Reinstate Monica

@Christoph Hanck: To get an approximation at some specifix

x_{t}

$x_t$ , you solve the saddlepoint equation

(§)

$\eqref{§}$ to find

t

$t$ .

— kjetil b halvorsen

Maybe it is worth pointing out that, when the empirical cgf is used, the resulting saddlepoint approximation is undefined outside the convex hull of the data. See Feuerverger (‎1989) "On the Empirical Saddlepoint Approximation". This should be the case also in the bootstrap example above.

— Matteo Fasiolo

Ici, je développe la réponse de kjetil, et je me concentre sur les situations dans lesquelles la fonction de génération de cumulant (CGF) est inconnue, mais elle peut être estimée à partir des données. $x_1,\dots,x_n$ , où $x\in R^d$ . L’estimateur CGF le plus simple est probablement celui de Davison et Hinkley (1988).

\hat{K} (λ) = \frac{1}{n} Σ_{je = 1}^{n} e^{λ^{T} X_{je}},

$\hat{K}(\lambda) = \frac{1}{n}\sum_{i=1}^{n}e^{\lambda^Tx_i},$ qui est celui utilisé dans l'exemple de bootstrap de kjetil. Cet estimateur a l’inconvénient que l’équation du point de selle résultante

{\hat{K}}^{'} (λ) = y,

$\hat{K}'(\lambda) = y,$ peut être résolu que si

y

$y$ , le point auquel nous voulons évaluer la densité de la pointe de la selle, tombe dans la coque convexe de

x_{1}, \dots, x_{n}

$x_1,\dots,x_n$ .

Wong (1992) et Fasiolo et al. (2016) ont abordé ce problème en proposant deux estimateurs CGF alternatifs, conçus de manière à ce que l'équation du point d'équilibre puisse être résolue pour tout type de calcul. $y$ . La solution de Fasiolo et al. (2016), appelé ESA d'approximation empirique élargie de Saddlepoint, est implémenté dans le package esaddle R et voici quelques exemples.

En tant qu’exemple univarié simple, envisagez d’utiliser ESA pour approcher un $\text{Gamma}(2, 1)$ densité.

library("devtools")
install_github("mfasiolo/esaddle")
library("esaddle")

########## Simulating data
x <- rgamma(1000, 2, 1)

# Fixing tuning parameter of ESA
decay <-  0.05

# Evaluating ESA at several point
xSeq <- seq(-2, 8, length.out = 200)
tmp <- dsaddle(y = xSeq, X = x, decay = decay, log = TRUE)

# Plotting true density, ESA and normal approximation
plot(xSeq, exp(tmp$llk), type = 'l', ylab = "Density", xlab = "x")
lines(xSeq, dgamma(xSeq, 2, 1), col = 3)
lines(xSeq, dnorm(xSeq, mean(x), sd(x)), col = 2)
suppressWarnings( rug(x) )
legend("topright", c("ESA", "Truth", "Gaussian"), col = c(1, 3, 2), lty = 1)

C'est la forme

En regardant la couverture, il est clair que nous avons évalué la densité de la zone ESA en dehors de la plage des données. Un exemple plus stimulant est le gaussien bivarié déformé suivant.

# Function that evaluates the true density
dwarp <- function(x, alpha) {
  d <- length(alpha) + 1
  lik <- dnorm(x[ , 1], log = TRUE)
  tmp <- x[ , 1]^2
  for(ii in 2:d)
    lik <- lik + dnorm(x[ , ii] - alpha[ii-1]*tmp, log = TRUE)
  lik
}

# Function that simulates from true distribution
rwarp <- function(n = 1, alpha) {
  d <- length(alpha) + 1
  z <- matrix(rnorm(n*d), n, d)
  tmp <- z[ , 1]^2
  for(ii in 2:d) z[ , ii] <- z[ , ii] + alpha[ii-1]*tmp
  z
}

set.seed(64141)
# Creating 2d grid
m <- 50
expansion <- 1
x1 <- seq(-2, 3, length=m)* expansion; 
x2 <- seq(-3, 3, length=m) * expansion
x <- expand.grid(x1, x2) 

# Evaluating true density on grid
alpha <- 1
dw <- dwarp(x, alpha = alpha)

# Simulate random variables
X <- rwarp(1000, alpha = alpha)

# Evaluating ESA density
dwa <- dsaddle(as.matrix(x), X, decay = 0.1, log = FALSE)$llk

# Plotting true density
par(mfrow = c(1, 2))
plot(X, pch=".", col=1, ylim = c(min(x2), max(x2)), xlim = c(min(x1), max(x1)),
     main = "True density", xlab = expression(X[1]), ylab = expression(X[2]))
contour(x1, x2, matrix(dw, m, m), levels = quantile(as.vector(dw), seq(0.8, 0.995, length.out = 10)), col=2, add=T)

# Plotting ESA density
plot(X, pch=".",col=2, ylim = c(min(x2), max(x2)), xlim = c(min(x1), max(x1)),
     main = "ESA density", xlab = expression(X[1]), ylab = expression(X[2]))
contour(x1, x2, matrix(dwa, m, m), levels = quantile(as.vector(dwa), seq(0.8, 0.995, length.out = 10)), col=2, add=T)

La coupe est assez bonne.

— Matteo Fasiolo
source

Grâce à l'excellente réponse de Kjetil, j'essaie de trouver moi-même un petit exemple que je voudrais aborder car il semble soulever un point pertinent:

Prendre en compte $\chi^2(m)$ Distribution. $K(t)$ et ses dérivés peuvent être trouvés ici et sont reproduits dans les fonctions du code ci-dessous.

x <- seq(0.01,20,by=.1)
m <- 5

K  <- function(t,m) -1/2*m*log(1-2*t)
K1 <- function(t,m) m/(1-2*t)
K2 <- function(t,m) 2*m/(1-2*t)^2

saddlepointapproximation <- function(x) {
  t <- .5-m/(2*x)
  exp( K(t,m)-t*x )*sqrt( 1/(2*pi*K2(t,m)) )
}
plot( x, saddlepointapproximation(x), type="l", col="salmon", lwd=2)
lines(x, dchisq(x,df=m), col="lightgreen", lwd=2)

Cela produit

Cela produit évidemment une approximation qui intègre correctement les caractéristiques qualitatives de la densité, mais, comme le confirme le commentaire de Kjetil, n’est pas une densité appropriée, car elle est supérieure à la densité exacte partout. Redimensionner l'approximation comme suit donne l'erreur d'approximation presque négligeable tracée ci-dessous.

scalingconstant <- integrate(saddlepointapproximation, x[1], x[length(x)])$value

approximationerror_unscaled <- dchisq(x,df=m) - saddlepointapproximation(x)
approximationerror_scaled   <- dchisq(x,df=m) - saddlepointapproximation(x) /
                                                    scalingconstant

plot( x, approximationerror_unscaled, type="l", col="salmon", lwd=2)
lines(x, approximationerror_scaled,             col="blue",   lwd=2)

— Christoph Hanck
source

C’est une caractéristique, l’approximation du point à cheval n’a pas besoin de s’intégrer à une, mais elle est souvent proche. Il peut être redimensionné par intégration numérique.

— kjetil b halvorsen

Il pourrait être plus révélateur de tracer l'erreur relative!

— kjetil b halvorsen

approximationerror_unscaled/approximationerror_scaled s'avère osciller autour de 25.90798

— Christoph Hanck