Archive for unbiasedness

MCMskv #3 [town with a view]

Posted in Statistics with tags , , , , , , , , , , , , , on January 8, 2016 by xi'an

Third day at MCMskv, where I took advantage of the gap left by the elimination of the Tweedie Race [second time in a row!] to complete and submit our mixture paper. Despite the nice weather. The rest of the day was quite busy with David Dunson giving a plenary talk on various approaches to approximate MCMC solutions, with a broad overview of the potential methods and of the need for better solutions. (On a personal basis, great line from David: “five minutes or four minutes?”. It almost beat David’s question on the previous day, about the weight of a finch that sounded suspiciously close to the question about the air-speed velocity of an unladen swallow. I was quite surprised the speaker did not reply with the Arthurian “An African or an European finch?”) In particular, I appreciated the notion that some problems were calling for a reduction in the number of parameters, rather than the number of observations. At which point I wrote down “multiscale approximations required” in my black pad,  a requirement David made a few minutes later. (The talk conditions were also much better than during Michael’s talk, in that the man standing between the screen and myself was David rather than the cameraman! Joke apart, it did not really prevent me from reading them, except for most of the jokes in small prints!)

The first session of the morning involved a talk by Marc Suchard, who used continued fractions to find a closed form likelihood for the SIR epidemiology model (I love continued fractions!), and a talk by Donatello Telesca who studied non-local priors to build a regression tree. While I am somewhat skeptical about non-local testing priors, I found this approach to the construction of a tree quite interesting! In the afternoon, I obviously went to the intractable likelihood session, with talks by Chris Oates on a control variate method for doubly intractable models, Brenda Vo on mixing sequential ABC with Bayesian bootstrap, and Gael Martin on our consistency paper. I was not aware of the Bayesian bootstrap proposal and need to read through the paper, as I fail to see the appeal of the bootstrap part! I later attended a session on exact Monte Carlo methods that was pleasantly homogeneous. With talks by Paul Jenkins (Warwick) on the exact simulation of the Wright-Fisher diffusion, Anthony Lee (Warwick) on designing perfect samplers for chains with atoms, Chang-han Rhee and Sebastian Vollmer on extensions of the Glynn-Rhee debiasing technique I previously discussed on the blog. (Once again, I regretted having to make a choice between the parallel sessions!)

The poster session (after a quick home-made pasta dish with an exceptional Valpolicella!) was almost universally great and with just the right number of posters to go around all of them in the allotted time. With in particular the Breaking News! posters of Giacomo Zanella (Warwick), Beka Steorts and Alexander Terenin. A high quality session that made me regret not touring the previous one due to my own poster presentation.

Bayes and unbiased

Posted in Books, Statistics with tags , , , on December 21, 2015 by xi'an

Siamak Noorbaloochi and Glen Meeden have arXived a note on some mathematical connections between Bayes estimators and unbiased estimators.  To start with, I never completely understood the reasons for [desperately] seeking unbiasedness in estimators, given that most transforms of a parameter θ do not allow for unbiased estimators when based on a finite sample from a parametric family with such parameter θ (Lehmann, 1983). (This is not even for a Bayesian reason!) The current paper first seems to use unbiasedness in a generalised sense introduced earlier by the authors in 1983, which is a less intuitive notion since it depends on loss and prior, still without being guaranteed to exist since it involves an infimum over a non-compact space. However, for the squared error loss adopted in this paper, it seems to reduce to the standard notion.

A first mathematical result therein is that the Bayes [i.e., posterior mean] and unbiasedness [i.e., sample mean] operators are adjoint in a Hilbert sense. But this does not seem much more than a consequence of Fubini’s theorem. The authors then proceeds to expose the central (decomposition) result of the paper, namely that every estimable function γ(θ) can be orthogonally decomposed into a function with an unbiased estimator plus a function whose Bayes estimator is zero and that conversely every square-integrable estimator can be orthogonally decomposed into a Bayes estimator (of something) plus an unbiased estimator of zero. This is a neat duality result, whose consequences I however fail to see because the Bayes estimator is estimating something else. And, somewhere, somehow, I have some trouble with the notion of a function α whose Bayes estimator [i.e., posterior mean] is zero for all values of the sample, esp. outside problems with finite observation and finite parameter spaces. For instance, if the sampling distribution is within an exponential family, the above property means that the Laplace transform of this function α is uniformly zero, hence that the function itself is uniformly zero.

SMC 2015

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on September 7, 2015 by xi'an

Nicolas Chopin ran a workshop at ENSAE on sequential Monte Carlo the past three days and it was a good opportunity to get a much needed up-to-date on the current trends in the field. Especially given that the meeting was literally downstairs from my office at CREST. And given the top range of researchers presenting their current or past work (in the very amphitheatre where I attended my first statistics lectures, a few dozen years ago!). Since unforeseen events made me miss most of the central day, I will not comment on individual talks, some of which I had already heard in the recent past, but this was a high quality workshop, topped by a superb organisation. (I started wondering why there was no a single female speaker in the program and so few female participants in the audience, then realised this is a field with a massive gender imbalance, which is difficult to explain given the different situation in Bayesian statistics and even in Bayesian computation…)  Some key topics I gathered during the talks I could attend–apologies to the other speakers for missing their talk due to those unforeseen events–are unbiasedness, which sounds central to the SMC methods [at least those presented there] as opposed to MCMC algorithms, and local features, used in different ways like hierarchical decomposition, multiscale, parallelisation, local coupling, &tc., to improve convergence and efficiency…

an extension of nested sampling

Posted in Books, Statistics, University life with tags , , , , , , , on December 16, 2014 by xi'an

I was reading [in the Paris métro] Hastings-Metropolis algorithm on Markov chains for small-probability estimation, arXived a few weeks ago by François Bachoc, Lionel Lenôtre, and Achref Bachouch, when I came upon their first algorithm that reminded me much of nested sampling: the following was proposed by Guyader et al. in 2011,

To approximate a tail probability P(H(X)>h),

  • start from an iid sample of size N from the reference distribution;
  • at each iteration m, select the point x with the smallest H(x)=ξ and replace it with a new point y simulated under the constraint H(y)≥ξ;
  • stop when all points in the sample are such that H(X)>h;
  • take

\left(1-\dfrac{1}{N}\right)^{m-1}

as the unbiased estimator of P(H(X)>h).

Hence, except for the stopping rule, this is the same implementation as nested sampling. Furthermore, Guyader et al. (2011) also take advantage of the bested sampling fact that, if direct simulation under the constraint H(y)≥ξ is infeasible, simulating via one single step of a Metropolis-Hastings algorithm is as valid as direct simulation. (I could not access the paper, but the reference list of Guyader et al. (2011) includes both original papers by John Skilling, so the connection must be made in the paper.) What I find most interesting in this algorithm is that it even achieves unbiasedness (even in the MCMC case!).

PMC for combinatoric spaces

Posted in Statistics, University life with tags , , , , , , , on July 28, 2014 by xi'an

I received this interesting [edited] email from Xiannian Fan at CUNY:

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions qi,t can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.

For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q)             (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(qi,t)    (**)

or does (*) apply to every qi,t?

For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

Fairly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we worked on with French colleagues in the early 2000’s.  (The name population Monte Carlo comes from Iba, 2000.)  While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑i,t {p(xi,t)/qi,t(xi,t)}h(qi,t),  xi,t~qi,t(x)

this estimator is only unbiased when the supports of the qi,t “s are all containing the support of p. The only other cases I can think of are

  1. associating the qi,t “s with a partition Si,t of the support of p and using instead

    i,t {p(xi,t)/qi,t(xi,t)}h(qi,t), xi,t~qi,t(x)

  2. resorting to AMIS under the assumption (**) and using instead

    1/n ∑i,t {p(xi,t)/∑j,t qj,t(xi,t)}h(qi,t), xi,t~qi,t(x)

but I am open to further suggestions!

computational methods for statistical mechanics [day #2]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on June 5, 2014 by xi'an

Arthur Seat, Edinburgh, Sep. 7, 2011

The last “tutorial” talk at this ICMS workshop [“at the interface between mathematical statistics and molecular simulation”] was given by Tony Lelièvre on adaptive bias schemes in Langevin algorithms and on the parallel replica algorithm. This was both very interesting because of the potential for connections with my “brand” of MCMC techniques and rather frustrating as I felt the intuition behind the physical concepts like free energy and metastability was almost within my reach! The most manageable time in Tony’s talk was the illustration of the concepts through a mixture posterior example. Example that I need to (re)read further to grasp the general idea. (And maybe the book on Free Energy Computations Tony wrote with Mathias Rousset et Gabriel Stoltz.) A definitely worthwhile talk that I hope will get posted on line by ICMS. The other talks of the day were mostly of a free energy nature, some using optimised bias in the Langevin diffusion (except for Pierre Jacob who presented his non-negative unbiased estimation impossibility result).

Luke and Pierre at big’MC

Posted in Linux, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on May 19, 2014 by xi'an

crossing Rue Soufflot on my way to IHP from Vieux Campeur, March 28, 2013Yesterday, Luke Bornn and Pierre Jacob gave a talk at our big’MC ‘minar. While I had seen most of the slides earlier, either at MCMski IV,  Banff, Leuven or yet again in Oxford, I really enjoyed those talks as they provided further intuition about the techniques of Wang-Landau and non-negative unbiased estimators, leading to a few seeds of potential ideas for even more potential research. For instance, I understood way better the option to calibrate the Wang-Landau algorithm on levels of the target density rather than in the original space. Which means (a) a one-dimensional partition target (just as in nested sampling); (b) taking advantage of the existing computations of the likelihood function; and (b) a somewhat automatic implementation of the Wang-Landau algorithm. I do wonder why this technique is not more popular as a default option. (Like, would it be compatible with Stan?) The impossibility theorem of Pierre about the existence of non-negative unbiased estimators never ceases to amaze me. I started wondering during the seminar whether a positive (!) version of the result could be found. Namely, whether perturbations of the exact (unbiased) Metropolis-Hastings acceptance ratio could be substituted in order to guarantee positivity. Possibly creating drifted versions of the target…

One request in connection with this post: please connect the Institut Henri Poincaré to the eduroam wireless network! The place is dedicated to visiting mathematicians and theoretical physicists, it should have been the first one [in Paris] to get connected to eduroam. The cost cannot be that horrendous so I wonder what the reason is. Preventing guests from connecting to the Internet towards better concentration? avoiding “parasites” taking advantage of the network? ensuring seminar attendees are following the talks? (The irony is that Institut Henri Poincaré has a local wireless available for free, except that it most often does not work with my current machine. And hence wastes much more of my time as I attempt to connect over and over again while there.) Just in connection with IHP, a video of Persi giving a talk there about Poincaré, two years ago:

Follow

Get every new post delivered to your Inbox.

Join 985 other followers