## postdoc position still open

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , , on May 30, 2019 by xi'an

The post-doctoral position supported by the ANR funding of our Paris-Saclay-Montpellier research conglomerate on approximate Bayesian inference and computation remains open for the time being. We are more particularly looking for candidates with a strong background in mathematical statistics, esp. Bayesian non-parametrics, towards the analysis of the limiting behaviour of approximate Bayesian inference. Candidates should email me (gmail address: bayesianstatistics) with a detailed vita (CV) and a motivation letter including a research plan. Letters of recommendation may also be emailed to the same address.

## selecting summary statistics [a tale of two distances]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 23, 2019 by xi'an

As Jonathan Harrison came to give a seminar in Warwick [which I could not attend], it made me aware of his paper with Ruth Baker on the selection of summaries in ABC. The setting is an ABC-SMC algorithm and it relates with Fearnhead and Prangle (2012), Barnes et al. (2012), our own random forest approach, the neural network version of Papamakarios and Murray (2016), and others. The notion here is to seek the optimal weights of different summary statistics in the tolerance distance, towards a maximization of a distance (Hellinger) between prior and ABC posterior (Wasserstein also comes to mind!). A sort of dual of the least informative prior. Estimated by a k-nearest neighbour version [based on samples from the prior and from the ABC posterior] I had never seen before. I first did not get how this k-nearest neighbour distance could be optimised in the weights since the posterior sample was already generated and (SMC) weighted, but the ABC sample can be modified by changing the [tolerance] distance weights and the resulting Hellinger distance optimised this way. (There are two distances involved, in case the above description is too murky!)

“We successfully obtain an informative unbiased posterior.”

The paper spends a significant while in demonstrating that the k-nearest neighbour estimator converges and much less on the optimisation procedure itself, which seems like a real challenge to me when facing a large number of particles and a high enough dimension (in the number of statistics). (In the examples, the size of the summary is 1 (where does the weight matter?), 32, 96, 64, with 5 10⁴, 5 10⁴, 5 10³ and…10 particles, respectively.) The authors address the issue, though, albeit briefly, by mentioning that, for the same overall computation time, the adaptive weight ABC is indeed further from the prior than a regular ABC with uniform weights [rather than weighted by the precisions]. They also argue that down-weighting some components is akin to selecting a subset of summaries, but I beg to disagree with this statement as the weights are never exactly zero, as far as I can see, hence failing to fight the curse of dimensionality. Some LASSO version could implement this feature.

## congrats, Prof Rousseau!

Posted in Statistics with tags , , , , , , , , on April 4, 2019 by xi'an

## Bernoulli race particle filters

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 27, 2019 by xi'an

Sebastian Schmon, Arnaud Doucet and George Deligiannidis have recently arXived an AISTATS paper with the above nice title. The motivation for the extension is facing intractable particle weights for state space models, as for instance in discretised diffusions.  In most cases, actually, the weight associated with the optimal forward proposal involves an intractable integral which is the predictive of the current observed variate given the past hidden states. And in some cases, there exist unbiased and non-negative estimators of the targets,  which can thus be substituted, volens nolens,  to the original filter. As in many pseudo-marginal derivations, this new algorithm can be interpreted as targeting an augmented distribution that involves the auxiliary random variates behind the unbiased estimators of the particle weights. A worthwhile remark since it allows for the preservation of the original target as in (8) provided the auxiliary random variates are simulated from the right conditionals. (At least ideally as I have no clue when this is feasible.)

“if Bernoulli resampling is per-formed, the variance for any Monte Carlo estimate will be the same as if the true weights were known and one applies standard multinomial resampling.”

The Bernoulli race in the title stands for a version of the Bernoulli factory problem, where an intractable and bounded component of the weight can be turned into a probability, for which a Bernoulli draw is available, hence providing a Multinomial sampling with the intractable weights since replacing the exact probability with an estimate does not modify the Bernoulli distribution, amazingly so! Even with intractable normalising constants in particle filters. The practicality of the approach may however be restricted by the possibility of some intractable terms being very small and requiring many rejections for one acceptance, as the number of attempts is a compound geometric. The intractability may add to the time request the drawback of keeping this feature hidden as well. Or force some premature interruption in the settings of a parallel implementation.

## absint[he] post-doc on approximate Bayesian inference in Paris, Montpellier and Oxford

Posted in Statistics with tags , , , , , , , , , , , , , on March 18, 2019 by xi'an

As a consequence of its funding by the Agence Nationale de la Recherche (ANR) in 2018, the ABSint research conglomerate is now actively recruiting a post-doctoral collaborator for up to 24 months. The accronym ABSint stands for Approximate Bayesian solutions for inference on large datasets and complex models. The ABSint conglomerate involves researchers located in Paris, Saclay, Montpelliers, as well as Lyon, Marseille, Nice. This call seeks candidates with an excellent research record and who are interested to collaborate with local researchers on approximate Bayesian techniques like ABC, variational Bayes, PAC-Bayes, Bayesian non-parametrics, scalable MCMC, and related topics. A potential direction of research would be the derivation of new Bayesian tools for model checking in such complex environments. The post-doctoral collaborator will be primarily located in Université Paris-Dauphine, with supported periods in Oxford and visits to Montpellier. No teaching duty is attached to this research position.

Applications can be submitted in either English or French. Sufficient working fluency in English is required. While mastering some French does help with daily life in France (!), it is not a prerequisite. The candidate must hold a PhD degree by the date of application (not the date of employment). Position opens on July 01, with possible accommodation for a later start in September or October.

Deadline for application is April 30 or until position filled. Estimated gross salary is around 2500 EUR, depending on experience (years) since PhD. Candidates should contact Christian Robert (gmail address: bayesianstatistics) with a detailed vita (CV) and a motivation letter including a research plan. Letters of recommendation may also be emailed to the same address.

## scalable Metropolis-Hastings

Posted in Books, Statistics, Travel with tags , , , , , , , , , on February 12, 2019 by xi'an

Among the flury of arXived papers of last week (414!), including a fair chunk of papers submitted to ICML 2019, I spotted one entry by Cornish et al. on scalable Metropolis-Hastings, which Arnaud Doucet had mentioned to me yesterday when in Oxford. The paper builds on the delayed acceptance paper we wrote with Marco Banterlé, Clara Grazian and Anthony Lee, itself relying on a factorisation decomposition of the likelihood, combined with control variate accelerating techniques. The factorisation of both the target and the proposal allows for a (less efficient) Metropolis-Hastings acceptance ratio that is the product

$\prod_{i=1}^m \alpha_i(\theta,\theta')$

of individual Metropolis-Hastings acceptance ratios, but which allows for quicker rejection if one of the probabilities in the product is small, because the corresponding Bernoulli draw is zero with high probability. One advance made in Michel et al. (2017) [which I doubly missed] is that subsampling is achievable by thinning (as in PDMPs, where these authors have been quite active) through an algorithm of Shantikumar (1985) [described in Devroye’s bible]. Provided each Metropolis-Hastings probability can be lower bounded:

$\alpha_i(\theta,\theta') \ge \exp\{-\psi_i \phi(\theta,\theta')\}$

by a term where the transition φ does not depend on the index i in the product. The computing cost of the thinning process thus depends on the efficiency of the subsampling, namely whether or not the (Poisson) number of terms is much smaller than m, number of terms in the product. A neat trick in the current paper that extends the the Fukui-Todo procedure is to switch to the original Metropolis-Hastings when the overall lower bound is too small, recovering the geometric ergodicity of this original if it holds (Theorem 2.1). Another neat remark is that when using the naïve factorisation as the product of the n individual likelihoods, the resulting algorithm is sort of doomed as n grows, even with an optimal scaling of the proposals. To achieve scalability, the authors introduce a Taylor (i.e., Gaussian) approximation to each local target in the product and start the acceptance decomposition by using the resulting overall Gaussian approximation. Meaning that the remaining product is now made of ratios of targets over their local Taylor approximations, hence most likely close to one. And potentially lower-bounded by the remainder term in the Taylor expansion. Leading to the conclusion that, when everything goes well, meaning that the Taylor expansions can be conducted and the bounds derived for the appropriate expansion, the order of the Poisson scale is O(1/√n)..! The proposal for the Metropolis-Hastings move is actually tuned to the Gaussian approximation, appearing as a variant of the Langevin move or more exactly a discretization of an Hamiltonian move. Obviously, I cannot judge of the complexity in implementing this new scheme from just reading the paper, but this development on the split target is definitely an exciting prospect for handling huge datasets and their friends!

## Jeffreys priors for hypothesis testing [Bayesian reads #2]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , on February 9, 2019 by xi'an

A second (re)visit to a reference paper I gave to my OxWaSP students for the last round of this CDT joint program. Indeed, this may be my first complete read of Susie Bayarri and Gonzalo Garcia-Donato 2008 Series B paper, inspired by Jeffreys’, Zellner’s and Siow’s proposals in the Normal case. (Disclaimer: I was not the JRSS B editor for this paper.) Which I saw as a talk at the O’Bayes 2009 meeting in Phillie.

The paper aims at constructing formal rules for objective proper priors in testing embedded hypotheses, in the spirit of Jeffreys’ Theory of Probability “hidden gem” (Chapter 3). The proposal is based on symmetrised versions of the Kullback-Leibler divergence κ between null and alternative used in a transform like an inverse power of 1+κ. With a power large enough to make the prior proper. Eventually multiplied by a reference measure (i.e., the arbitrary choice of a dominating measure.) Can be generalised to any intrinsic loss (not to be confused with an intrinsic prior à la Berger and Pericchi!). Approximately Cauchy or Student’s t by a Taylor expansion. To be compared with Jeffreys’ original prior equal to the derivative of the atan transform of the root divergence (!). A delicate calibration by an effective sample size, lacking a general definition.

At the start the authors rightly insist on having the nuisance parameter v to differ for each model but… as we all often do they relapse back to having the “same ν” in both models for integrability reasons. Nuisance parameters make the definition of the divergence prior somewhat harder. Or somewhat arbitrary. Indeed, as in reference prior settings, the authors work first conditional on the nuisance then use a prior on ν that may be improper by the “same” argument. (Although conditioning is not the proper term if the marginal prior on ν is improper.)

The paper also contains an interesting case of the translated Exponential, where the prior is L¹ Student’s t with 2 degrees of freedom. And another one of mixture models albeit in the simple case of a location parameter on one component only.