## approximative Laplace

Posted in Books, R, Statistics with tags , , , , on August 18, 2018 by xi'an I came across this question on X validated that wondered about one of our examples in Monte Carlo Statistical Methods. We have included a section on Laplace approximations in the Monte Carlo integration chapter, with a bit of reluctance on my side as this type of integral approximation does not directly connect to Monte Carlo methods. Even less in the case of the example as we aimed at replacing a coverage probability for a Gamma distribution with a formal Laplace approximation. Formal due to the lack of asymptotics, besides the length of the interval (a,b) which probability is approximated. Hence, on top of the typos, the point of the example is not crystal clear, in that it does not show much more than the step-function approximation to the function converges as the interval length gets to zero. For instance, using instead a flat approximation produces an almost as good approximation:

>  xact(5,2,7,9)
 0.1933414
> laplace(5,2,7,9)
 0.1933507
> flat(5,2,7,9)
 0.1953668


What may be more surprising is the resilience of the approximation as the width of the interval increases:

> xact(5,2,5,11)
 0.53366
> lapl(5,2,5,11)
 0.5354954
> plain(5,2,5,11)
 0.5861004
 0.434131


## g-and-k [or -h] distributions

Posted in Statistics with tags , , , , , , , , , on July 17, 2017 by xi'an Dennis Prangle released last week an R package called gk and an associated arXived paper for running inference on the g-and-k and g-and-h quantile distributions. As should be clear from an earlier review on Karian’s and Dudewicz’s book quantile distributions, I am not particularly fond of those distributions which construction seems very artificial to me, as mostly based on the production of a closed-form quantile function. But I agree they provide a neat benchmark for ABC methods, if nothing else. However, as recently pointed out in our Wasserstein paper with Espen Bernton, Pierre Jacob and Mathieu Gerber, and explained in a post of Pierre’s on Statisfaction, the pdf can be easily constructed by numerical means, hence allows for an MCMC resolution, which is also a point made by Dennis in his paper. Using the closed-form derivation of the Normal form of the distribution [i.e., applied to Φ(x)] so that numerical derivation is not necessary.

## puzzled by harmony [not!]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel with tags , , , , , on December 13, 2016 by xi'an In answering yet another question on X validated about the numerical approximation of the marginal likelihood, I suggested using an harmonic mean estimate as a simple but worthless solution based on an MCMC posterior sample. This was on a toy example with a uniform prior on (0,π) and a “likelihood” equal to sin(θ) [really a toy problem!]. Simulating an MCMC chain by a random walk Metropolis-Hastings algorithm is straightforward, as is returning the harmonic mean of the sin(θ)’s.

f <- function(x){
if ((0<x)&(x<pi)){
return(sin(x))}else{
return(0)}}

n = 2000 #number of iterations
sigma = 0.5
x = runif(1,0,pi) #initial x value
chain = fx = f(x)
#generates an array of random x values from norm distribution
rands = rnorm(n,0, sigma)
#Metropolis - Hastings algorithm
for (i in 2:n){
can = x + rands[i]  #candidate for jump
fcan=f(can)
aprob = fcan/fx #acceptance probability
if (runif(1) < aprob){
x = can
fx = fcan}
chain=c(chain,fx)}
I = pi*length(chain)/sum(1/chain) #integral harmonic approximation

However, the outcome looks remarkably stable and close to the expected value 2/π, despite 1/sin(θ) having an infinite integral on (0,π). Meaning that the average of the 1/sin(θ)’s has no variance. Hence I wonder why this specific example does not lead to an unreliable output… But re-running the chain with a smaller scale σ starts producing values of sin(θ) regularly closer to zero, which leads to an estimate of I both farther away from 2 and much more variable. No miracle, in the end!

## probabilistic numerics

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on April 27, 2015 by xi'an I attended an highly unusual workshop while in Warwick last week. Unusual for me, obviously. It was about probabilistic numerics, i.e., the use of probabilistic or stochastic arguments in the numerical resolution of (possibly) deterministic problems. The notion in this approach is fairly Bayesian in that it makes use to prior information or belief about the quantity of interest, e.g., a function, to construct an usually Gaussian process prior and derive both an estimator that is identical to a numerical method (e.g., Runge-Kutta or trapezoidal integration) and uncertainty or variability around this estimator. While I did not grasp much more than the classy introduction talk by Philipp Hennig, this concept sounds fairly interesting, if only because of the Bayesian connection, and I wonder if we will soon see a probability numerics section at ISBA! More seriously, placing priors on functions or functionals is a highly formal perspective (as in Bayesian non-parametrics) and it makes me wonder how much of the data (evaluation of a function at a given set of points) and how much of the prior is reflected in the output [variability]. (Obviously, one could also ask a similar question for statistical analyses!)  For instance, issues of singularity arise among those stochastic process priors.

Another question that stemmed from this talk is whether or not more efficient numerical methods can derived that way, in addition to recovering the most classical ones. Somewhat, somehow, given the idealised nature of the prior, it feels like priors could be more easily compared or ranked than in classical statistical problems. Since the aim is to figure out the value of an integral or the solution to an ODE. (Or maybe not, since again almost the same could be said about estimating a normal mean.)

## integral priors for binomial regression

Posted in pictures, R, Statistics, University life with tags , , , , , , , , on July 2, 2013 by xi'an Diego Salmerón and Juan Antonio Cano from Murcia, Spain (check the movie linked to the above photograph!), kindly included me in their recent integral prior paper, even though I mainly provided (constructive) criticism. The paper has just been arXived.

A few years ago (2008 to be precise), we wrote together an integral prior paper, published in TEST, where we exploited the implicit equation defining those priors (Pérez and Berger, 2002), to construct a Markov chain providing simulations from both integral priors. This time, we consider the case of a binomial regression model and the problem of variable selection. The integral equations are similarly defined and a Markov chain can again be used to simulate from the integral priors. However, the difficulty therein follows from the regression structure, which makes selecting training datasets more elaborate, and  whose posterior is not standard. Most fortunately, because the training dataset is exactly the right dimension, a re-parameterisation allows for a simulation of Bernoulli probabilities, provided a Jeffreys prior is used on those.  (This obviously makes the “prior” dependent on the selected training dataset, but it should not overly impact the resulting inference.)

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 14, 2012 by xi'an This week, my student Dona Skanji gave a presentation of the paper of Hastings “Monte Carlo sampling methods using Markov chains and their applications“, which set the rules for running MCMC algorithms, much more so than the original paper by Metropolis et al. which presented an optimisation device. even though the latter clearly stated the Markovian principle of those algorithms and their use for integration. (This is definitely a classic, selected in the book Biometrika: One hundred years, by Mike Titterington and David Cox.) Here are her slides (the best Beamer slides so far!):

Given that I had already taught my lectures on Markov chains and on MCMC algorithms, the preliminary part of Dona’s talk was easier to compose and understanding the principles of the method was certainly more straightforward than for the other papers in the series. I think she nonetheless did a rather good job in summing up the paper, running this extra simulation for the Poisson distribution—with the interesting “mistake” of including the burnin time in the representation of the output and concluding about a poor convergence—and mentioning the Gibbs extension.I led the discussion of the seminar towards irreducibility conditions and Peskun’s ordering of Markov chains, which maybe could have been mentioned by Dona since she was aware Peskun was Hastings‘ student.

## Random construction of interpolating sets

Posted in Kids, Statistics, University life with tags , , , , on January 5, 2012 by xi'an  One of the many arXiv papers I could not discuss earlier is Huber and Schott’s “Random construction of interpolating sets for high dimensional integration” which relates to their earlier TPA paper at the València meeting. (Paper that we discussed with Nicolas Chopin.) TPA stands for tootsie pop algorithm, The paper is very pleasant to read, just like its predecessor. The principle behind TPA is that the number of steps in the algorithm is Poisson with parameter connected  to  the unknown measure of the inner set: $N\sim\mathcal{P}(\ln[\mu(B)/\mu(B^\prime)])$

Therefore, the variance of the estimation is known as well.  This is a significant property of a mathematically elegant solution. As already argued in our earlier discussion, it however seems the paper is defending an integral approximation that sounds far from realistic, in my opinion. Indeed, the TPA method requires as a fundamental item the ability to simulate from a measure μ restricted to a level set A(β). Exact simulation seems close to impossible in any realistic problem. Just as in Skilling (2006)’s nested sampling. Furthermore, the comparison with nested sampling is evacuated rather summarily: that the variance of this alternative cannot be computed “prior to running the algorithm” does not mean it is larger than the one of the TPA method. If the proposal is to become a realistic algorithm, some degree of comparison with the existing should appear in the paper. (A further if minor comment about the introduction is that the reason for picking the relative ideal balance α=0.2031 in the embedded sets is not clear. Not that it really matters in the implementation unless Section 5 on well-balanced sets is connected with this ideal ratio…)