Archive for Monte Carlo approximations

Bernoulli race particle filters

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 27, 2019 by xi'an

Sebastian Schmon, Arnaud Doucet and George Deligiannidis have recently arXived an AISTATS paper with the above nice title. The motivation for the extension is facing intractable particle weights for state space models, as for instance in discretised diffusions.  In most cases, actually, the weight associated with the optimal forward proposal involves an intractable integral which is the predictive of the current observed variate given the past hidden states. And in some cases, there exist unbiased and non-negative estimators of the targets,  which can thus be substituted, volens nolens,  to the original filter. As in many pseudo-marginal derivations, this new algorithm can be interpreted as targeting an augmented distribution that involves the auxiliary random variates behind the unbiased estimators of the particle weights. A worthwhile remark since it allows for the preservation of the original target as in (8) provided the auxiliary random variates are simulated from the right conditionals. (At least ideally as I have no clue when this is feasible.)

“if Bernoulli resampling is per-formed, the variance for any Monte Carlo estimate will be the same as if the true weights were known and one applies standard multinomial resampling.”

The Bernoulli race in the title stands for a version of the Bernoulli factory problem, where an intractable and bounded component of the weight can be turned into a probability, for which a Bernoulli draw is available, hence providing a Multinomial sampling with the intractable weights since replacing the exact probability with an estimate does not modify the Bernoulli distribution, amazingly so! Even with intractable normalising constants in particle filters. The practicality of the approach may however be restricted by the possibility of some intractable terms being very small and requiring many rejections for one acceptance, as the number of attempts is a compound geometric. The intractability may add to the time request the drawback of keeping this feature hidden as well. Or force some premature interruption in the settings of a parallel implementation.

an interesting identity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 1, 2018 by xi'an

Another interesting X validated question, another remembrance of past discussions on that issue. Discussions that took place in the Institut d’Astrophysique de Paris, nearby this painting of Laplace, when working on our cosmostats project. Namely the potential appeal of recycling multidimensional simulations by permuting the individual components in nearly independent settings. As shown by the variance decomposition in my answer, when opposing N iid pairs (X,Y) to the N combinations of √N simulations of X and √N simulations of Y, the comparison

\text{var} \hat{\mathfrak{h}}^2_N=\text{var} (\hat{\mathfrak{h}}^1_N)+\frac{mn(n-1)}{N^2}\,\text{var}^Y\left\{ \mathbb{E}^{X}\left\{\mathfrak{h}(X,Y)\right\}\right\}

+\frac{m(m-1)n}{N^2}\,\text{var}^X\left[\mathbb{E}^Y\left\{\mathfrak{h}(X,Y)\right\}\right]

unsurprisingly gives the upper hand to the iid sequence. A sort of converse to Rao-Blackwellisation…. Unless the production of N simulations gets much more costly when compared with the N function evaluations. No wonder we never see this proposal in Monte Carlo textbooks!

a well-hidden E step

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , on February 3, 2017 by xi'an

Grand Palais from Esplanade des Invalides, Paris, Dec. 07, 2012A recent question on X validated ended up being quite interesting! The model under consideration is made of parallel Markov chains on a finite state space, all with the same Markov transition matrix, M, which turns into a hidden Markov model when the only summary available is the number of chains in a given state at a given time. When writing down the EM algorithm, the E step involves the expected number of moves from a given state to a given state at a given time. The conditional distribution of those numbers of chains is a product of multinomials across times and starting states, with no Markov structure since the number of chains starting from a given state is known at each instant. Except that those multinomials are constrained by the number of “arrivals” in each state at the next instant and that this makes the computation of the expectation intractable, as far as I can see.

A solution by Monte Carlo EM means running the moves for each instant under the above constraints, which is thus a sort of multinomial distribution with fixed margins, enjoying a closed-form expression but for the normalising constant. The direct simulation soon gets too costly as the number of states increases and I thus considered a basic Metropolis move, using one margin (row or column) or the other as proposal, with the correction taken on another margin. This is very basic but apparently enough for the purpose of the exercise. If I find time in the coming days, I will try to look at the ABC resolution of this problem, a logical move when starting from non-sufficient statistics!

importance sampling and necessary sample size

Posted in Books, Statistics with tags , , , , , on September 7, 2016 by xi'an

Daniel Sanz-Alonso arXived a note yesterday where he analyses importance sampling from the point of view of empirical distributions. With the difficulty that unnormalised importance sampling estimators are not associated with an empirical distribution since the sum of the weights is not one. For several f-divergences, he obtains upper bounds on those divergences between the empirical cdf and a uniform version, D(w,u), which translate into lower bounds on the importance sample size. I however do not see why this divergence between a weighted sampled and the uniformly weighted version is relevant for the divergence between the target and the proposal, nor how the resulting Monte Carlo estimator is impacted by this bound. A side remark [in the paper] is that those results apply to infinite variance Monte Carlo estimators, as in the recent paper of Chatterjee and Diaconis I discussed earlier, which also discussed the necessary sample size.

merging MCMC subposteriors

Posted in Books, Statistics, University life with tags , , , , , , , on June 8, 2016 by xi'an

Christopher Nemeth and Chris Sherlock arXived a paper yesterday about an approach to distributed MCMC sampling via Gaussian processes. As in several other papers commented on the ‘Og, the issue is to merge MCMC samples from sub-posteriors into a sample or any sort of approximation of the complete (product) posterior. I am quite sympathetic to the approach adopted in this paper, namely to use a log-Gaussian process representation of each sub-posterior and then to replace each sub-posterior with its log-Gaussian process posterior expectation in an MCMC or importance scheme. And to assess its variability through the posterior variance of the sum of log-Gaussian processes. As pointed out by the authors the closed form representation of the posterior mean of the log-posterior is invaluable as it allows for an HMC implementation. And importance solutions as well. The probabilistic numerics behind this perspective are also highly relevant.

A few arguable (?) points:

  1. The method often relies on importance sampling and hence on the choice of an importance function that is most likely influential but delicate to calibrate in complex settings as I presume the Gaussian estimates are not useful in this regard;
  2. Using Monte Carlo to approximate the value of the approximate density at a given parameter value (by simulating from the posterior distribution) is natural but is it that efficient?
  3. It could be that, by treating all sub-posterior samples as noisy versions of the same (true) posterior, a more accurate approximation of this posterior could be constructed;
  4. The method relies on the exponentiation of a posterior expectation or simulation. As of yesterday, I am somehow wary of log-normal expectations!
  5. If the purpose of the exercise is to approximate univariate integrals, it would seem more profitable to use the Gaussian processes at the univariate level;
  6. The way the normalising missing constants and the duplicate simulations are processed (or not) could deserve further exploration;
  7. Computing costs are in fine unclear when compared with the other methods in the toolbox.
%d bloggers like this: