## ABC@NIPS: call for papers

Posted in Statistics, Travel, University life with tags , , , , , , , , , on September 9, 2014 by xi'an

In connection with the previous announcement of ABC in Montréal, a call for papers that came out today:

NIPS 2014 Workshop: ABC in Montreal

December 12, 2014

Approximate Bayesian computation (ABC) or likelihood-free (LF) methods have developed mostly beyond the radar of the machine learning community, but are important tools for a large segment of the scientific community. This is particularly true for systems and population biology, computational psychology, computational chemistry, etc. Recent work has both applied machine learning models and algorithms to general ABC inference (NN, forests, GPs) and ABC inference to machine learning (e.g. using computer graphics to solve computer vision using ABC). In general, however, there is significant room for collaboration between the two communities.

The workshop will consist of invited and contributed talks, poster spotlights, and a poster session. Rather than a panel discussion we will encourage open discussion between the speakers and the audience!

Examples of topics of interest in the workshop include (but are not limited to):

* Applications of ABC to machine learning, e.g., computer vision, inverse problems
* ABC in Systems Biology, Computational Science, etc
* ABC Reinforcement Learning
* Machine learning simulator models, e.g., NN models of simulation responses, GPs etc.
* Selection of sufficient statistics
* Online and post-hoc error
* ABC with very expensive simulations and acceleration methods (surrogate modeling, choice of design/simulation points)
* ABC with probabilistic programming
* Posterior evaluation of scientific problems/interaction with scientists
* Post-computational error assessment
* Impact on resulting ABC inference
* ABC for model selection

===========
Submission:

## this issue of Series B

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on September 5, 2014 by xi'an

The September issue of [JRSS] Series B I received a few days ago is of particular interest to me. (And not as an ex-co-editor since I was never involved in any of those papers!) To wit: a paper by Hani Doss and Aixin Tan on evaluating normalising constants based on MCMC output, a preliminary version I had seen at a previous JSM meeting, a paper by Nick Polson, James Scott and Jesse Windle on the Bayesian bridge, connected with Nick’s talk in Boston earlier this month, yet another paper by Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar and Michael Jordan on the bag of little bootstraps, which presentation I heard Michael deliver a few times when he was in Paris. (Obviously, this does not imply any negative judgement on the other papers of this issue!)

For instance, Doss and Tan consider the multiple mixture estimator [my wording, the authors do not give the method a name, referring to Vardi (1985) but missing the connection with Owen and Zhou (2000)] of k ratios of normalising constants, namely

$\sum_{l=1}^k \frac{1}{n_l} \sum_{t=1}^{n_l} \dfrac{n_l g_j(x_t^l)}{\sum_{s=1}^k n_s g_s(x_t^l) z_1/z_s } \longrightarrow \dfrac{z_j}{z_1}$

where the z’s are the normalising constants and with possible different numbers of iterations of each Markov chain. An interesting starting point (that Hans Künsch had mentioned to me a while ago but that I had since then forgotten) is that the problem was reformulated by Charlie Geyer (1994) as a quasi-likelihood estimation where the ratios of all z’s relative to one reference density are the unknowns. This is doubling interesting, actually, because it restates the constant estimation problem into a statistical light and thus somewhat relates to the infamous “paradox” raised by Larry Wasserman a while ago. The novelty in the paper is (a) to derive an optimal estimator of the ratios of normalising constants in the Markov case, essentially accounting for possibly different lengths of the Markov chains, and (b) to estimate the variance matrix of the ratio estimate by regeneration arguments. A favourite tool of mine, at least theoretically as practically useful minorising conditions are hard to come by, if at all available.

## PMC for combinatoric spaces

Posted in Statistics, University life with tags , , , , , , , on July 28, 2014 by xi'an

I received this interesting [edited] email from Xiannian Fan at CUNY:

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions qi,t can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.

For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q)             (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(qi,t)    (**)

or does (*) apply to every qi,t?

For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

Fairly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we worked on with French colleagues in the early 2000’s.  (The name population Monte Carlo comes from Iba, 2000.)  While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑i,t {p(xi,t)/qi,t(xi,t)}h(qi,t),  xi,t~qi,t(x)

this estimator is only unbiased when the supports of the qi,t “s are all containing the support of p. The only other cases I can think of are

1. associating the qi,t “s with a partition Si,t of the support of p and using instead

i,t {p(xi,t)/qi,t(xi,t)}h(qi,t), xi,t~qi,t(x)

2. resorting to AMIS under the assumption (**) and using instead

1/n ∑i,t {p(xi,t)/∑j,t qj,t(xi,t)}h(qi,t), xi,t~qi,t(x)

but I am open to further suggestions!

## vector quantile regression

Posted in pictures, Statistics, University life with tags , , , , , , , on July 4, 2014 by xi'an

My Paris-Dauphine colleague Guillaume Carlier recently arXived a statistics paper entitled Vector quantile regression, co-written with Chernozhukov and Galichon. I was most curious to read the paper as Guillaume is primarily a mathematical analyst working on optimisation problems like optimal transport. And also because I find quantile regression difficult to fathom as a statistical problem. (As it happens, both his co-authors are from econometrics.) The results in the paper are (i) to show that a d-dimensional (Lebesgue) absolutely continuous random variable Y can always be represented as the deterministic transform Y=Q(U), where U is a d-dimensional [0,1] uniform (the paper expresses this transform as conditional on a set of regressors Z, but those essentially play no role) and Q is monotonous in the sense of being the gradient of a convex function,

$Q(u) = \nabla q(u)$ and $\{Q(u)-Q(v)\}^\text{T}(u-v)\ge 0;$

(ii) to deduce from this representation a unique notion of multivariate quantile function; and (iii) to consider the special case when the quantile function Q can be written as the linear

$\beta(U)^\text{T}Z$

where β(U) is a matrix. Hence leading to an estimation problem.

While unsurprising from a measure theoretic viewpoint, the representation theorem (i) is most interesting both for statistical and simulation reasons. Provided the function Q can be easily estimated and derived, respectively. The paper however does not provide a constructive tool for this derivation, besides indicating several characterisations as solutions of optimisation problems. From a statistical perspective, a non-parametric estimation of  β(.) would have useful implications in multivariate regression, although the paper only considers the specific linear case above. Which solution is obtained by a discretisation of all variables and  linear programming.

## Pre-processing for approximate Bayesian computation in image analysis

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on March 21, 2014 by xi'an

With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).

Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)

Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.

## Approximate Integrated Likelihood via ABC methods

Posted in Books, Statistics, University life with tags , , , , , , , , on March 13, 2014 by xi'an

My PhD student Clara Grazian just arXived this joint work with Brunero Liseo on using ABC for marginal density estimation. The idea in this paper is to produce an integrated likelihood approximation in intractable problems via the ratio

$L(\psi|x)\propto \dfrac{\pi(\psi|x)}{\pi(\psi)}$

both terms in the ratio being estimated from simulations,

$\hat L(\psi|x) \propto \dfrac{\hat\pi^\text{ABC}(\psi|x)}{\hat\pi(\psi)}$

(with possible closed form for the denominator). Although most of the examples processed in the paper (Poisson means ratio, Neyman-Scott’s problem, g-&-k quantile distribution, semi-parametric regression) rely on summary statistics, hence de facto replacing the numerator above with a pseudo-posterior conditional on those summaries, the approximation remains accurate (for those examples). In the g-&-k quantile example, Clara and Brunero compare our ABC-MCMC algorithm with the one of Allingham et al. (2009, Statistics & Computing): the later does better by not replicating values in the Markov chain but instead proposing a new value until it is accepted by the usual Metropolis step. (Although I did not spend much time on this issue, I cannot see how both approaches could be simultaneously correct. Even though the outcomes do not look very different.) As noted by the authors, “the main drawback of the present approach is that it requires the use of proper priors”, unless the marginalisation of the prior can be done analytically. (This is an interesting computational problem: how to provide an efficient approximation to a marginal density of a σ-finite measure, assuming this density exists.)

Clara will give a talk at CREST-ENSAE today about this work, in the Bayes in Paris seminar: 2pm in room 18.

## Advances in Scalable Bayesian Computation [group photo]

Posted in Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 8, 2014 by xi'an