## Archive for University of Edinburgh

Posted in Statistics with tags , , , , , , , , on March 20, 2019 by xi'an

A paper on ABC I read on my way back from Cambodia:  Yanzhi Chen and Michael Gutmann arXived an ABC [in Edinburgh] paper on learning the target via Gaussian copulas, to be presented at AISTATS this year (in Okinawa!). Linking post-processing (regression) ABC and sequential ABC. The drawback in the regression approach is that the correction often relies on an homogeneity assumption on the distribution of the noise or residual since this approach only applies a drift to the original simulated sample. Their method is based on two stages, a coarse-grained one where the posterior is approximated by ordinary linear regression ABC. And a fine-grained one, which uses the above coarse Gaussian version as a proposal and returns a Gaussian copula estimate of the posterior. This proposal is somewhat similar to the neural network approach of Papamakarios and Murray (2016). And to the Gaussian copula version of Li et al. (2017). The major difference being the presence of two stages. The new method is compared with other ABC proposals at a fixed simulation cost, which does not account for the construction costs, although they should be relatively negligible. To compare these ABC avatars, the authors use a symmetrised Kullback-Leibler divergence I had not met previously, requiring a massive numerical integration (although this is not an issue for the practical implementation of the method, which only calls for the construction of the neural network(s)). Note also that sequential ABC is only run for two iterations, and also that none of the importance sampling ABC versions of Fearnhead and Prangle (2012) and of Li and Fearnhead (2018) are considered, all versions relying on the same vector of summary statistics with a dimension much larger than the dimension of the parameter. Except in our MA(2) example, where regression does as well. I wonder at the impact of the dimension of the summary statistic on the performances of the neural network, i.e., whether or not it is able to manage the curse of dimensionality by ignoring all but essentially the data  statistics in the optimisation.

## from Arthur’s Seat [spot ISBA participants]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

## Bayesian GANs [#2]

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , , , on June 27, 2018 by xi'an

As an illustration of the lack of convergence of the Gibbs sampler applied to the two “conditionals” defined in the Bayesian GANs paper discussed yesterday, I took the simplest possible example of a Normal mean generative model (one parameter) with a logistic discriminator (one parameter) and implemented the scheme (during an ISBA 2018 session). With flat priors on both parameters. And a Normal random walk as Metropolis-Hastings proposal. As expected, since there is no stationary distribution associated with the Markov chain, simulated chains do not exhibit a stationary pattern,

And they eventually reach an overflow error or a trapping state as the log-likelihood gets approximately to zero (red curve).

Too bad I missed the talk by Shakir Mohammed yesterday, being stuck on the Edinburgh by-pass at rush hour!, as I would have loved to hear his views about this rather essential issue…

## ABCDE for approximate Bayesian conditional density estimation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on February 26, 2018 by xi'an

Another arXived paper I surprisingly (?) missed, by George Papamakarios and Iain Murray, on an ABCDE (my acronym!) substitute to ABC for generative models. The paper was reviewed [with reviews made available!] and accepted by NIPS 2016. (Most obviously, I was not one of the reviewers!)

“Conventional ABC algorithms such as the above suffer from three drawbacks. First, they only represent the parameter posterior as a set of (possibly weighted or correlated) samples [for which] it is not obvious how to perform some other computations using samples, such as combining posteriors from two separate analyses. Second, the parameter samples do not come from the correct Bayesian posterior (…) Third, as the ε-tolerance is reduced, it can become impractical to simulate the model enough times to match the observed data even once [when] simulations are expensive to perform”

The above criticisms are a wee bit overly harsh as, well…, Monte Carlo approximations remain a solution worth considering for all Bayesian purposes!, while the approximation [replacing the data with a ball] in ABC is replaced with an approximation of the true posterior as a mixture. Both requiring repeated [and likely expensive] simulations. The alternative is in iteratively simulating from pseudo-predictives towards learning better pseudo-posteriors, then used as new proposals at the next iteration modulo an importance sampling correction.  The approximation to the posterior chosen therein is a mixture density network, namely a mixture distribution with parameters obtained as neural networks based on the simulated pseudo-observations. Which the authors claim [p.4] requires no tuning. (Still, there are several aspects to tune, from the number of components to the hyper-parameter λ [p.11, eqn (35)], to the structure of the neural network [20 tanh? 50 tanh?], to the number of iterations, to the amount of X checking. As usual in NIPS papers, it is difficult to assess how arbitrary the choices made in the experiments are. Unless one starts experimenting with the codes provided.) All in all, I find the paper nonetheless exciting enough (!) to now start a summer student project on it in Dauphine and hope to check the performances of ABCDE on different models, as well as comparing this ABC implementation with a synthetic likelihood version.

As an addendum, let me point out the very pertinent analysis of this paper by Dennis Prangle, 18 months ago!

## Alan Turing Institute

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , on February 10, 2015 by xi'an

The University of Warwick is one of the five UK Universities (Cambridge, Edinburgh, Oxford, Warwick and UCL) to be part of the new Alan Turing Institute.To quote from the University press release,  “The Institute will build on the UK’s existing academic strengths and help position the country as a world leader in the analysis and application of big data and algorithm research. Its headquarters will be based at the British Library at the centre of London’s Knowledge Quarter.” The Institute will gather researchers from mathematics, statistics, computer sciences, and connected fields towards collegial and focussed research , which means in particular that it will hire a fairly large number of researchers in stats and machine-learning in the coming months. The Department of Statistics at Warwick was strongly involved in answering the call for the Institute and my friend and colleague Mark Girolami will the University leading figure at the Institute, alas meaning that we will meet even less frequently! Note that the call for the Chair of the Alan Turing Institute is now open, with deadline on March 15. [As a personal aside, I find the recognition that Alan Turing’s genius played a pivotal role in cracking the codes that helped us win the Second World War. It is therefore only right that our country’s top universities are chosen to lead this new institute named in his honour. by the Business Secretary does not absolve the legal system that drove Turing to suicide….]

## brief stop in Edinburgh

Posted in Mountains, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , on January 24, 2015 by xi'an

Yesterday, I was all too briefly in Edinburgh for a few hours, to give a seminar in the School of Mathematics, on the random forests approach to ABC model choice (that was earlier rejected). (The slides are almost surely identical to those used at the NIPS workshop.) One interesting question at the end of the talk was on the potential bias in the posterior predictive expected loss, bias against some model from the collection of models being evaluated for selection. In the sense that the array of summaries used by the random forest could fail to capture features of a particular model and hence discriminate against it. While this is correct, there is no fundamental difference with implementing a posterior probability based on the same summaries. And the posterior predictive expected loss offers the advantage of testing, that is, for representative simulations from each model, of returning the corresponding model prediction error to highlight poor performances on some models. A further discussion over tea led me to ponder whether or not we could expand the use of random forests to Bayesian quantile regression. However, this would imply a monotonicity structure on a collection of random forests, which sounds daunting…

My stay in Edinburgh was quite brief as I drove to the Highlands after the seminar, heading to Fort William, Although the weather was rather ghastly, the traffic was fairly light and I managed to get there unscathed, without hitting any of the deer of Rannoch Mor (saw one dead by the side of the road though…) or the snow banks of the narrow roads along Loch Lubnaig. And, as usual, it still was a pleasant feeling to drive through those places associated with climbs and hikes, Crianlarich, Tyndrum, Bridge of Orchy, and Glencoe. And to get in town early enough to enjoy a quick dinner at The Grog & Gruel, reflecting I must have had half a dozen dinners there with friends (or not) over the years. And drinking a great heather ale to them!

## how many modes in a normal mixture?

Posted in Books, Kids, Statistics, University life with tags , , , , , , on January 7, 2015 by xi'an

An interesting question I spotted on Cross Validated today: How to tell if a mixture of Gaussians will be multimodal? Indeed, there is no known analytical condition on the parameters of a fully specified k-component mixture for the modes to number k or less than k… Googling around, I immediately came upon this webpage by Miguel Carrera-Perpinan, who studied the issue with Chris Williams when writing his PhD in Edinburgh. And upon this paper, which not only shows that

1. unidimensional Gaussian mixtures with k components have at most k modes;
2. unidimensional non-Gaussian mixtures with k components may have more than k modes;
3. multidimensional mixtures with k components may have more than k modes.

but also provides ways of finding all the modes. Ways which seem to reduce to using EM from a wide variety of starting points (an EM algorithm set in the sampling rather than in the parameter space since all parameters are set!). Maybe starting EM from each mean would be sufficient.  I still wonder if there are better ways, from letting the variances decrease down to zero until a local mode appear, to using some sort of simulated annealing…

Edit: Following comments, let me stress this is not a statistical issue in that the parameters of the mixture are set and known and there is no observation(s) from this mixture from which to estimate the number of modes. The mathematical problem is to determine how many local maxima there are for the function

$f(x)\,:\,x \longrightarrow \sum_{i=1}^k p_i \varphi(x;\mu_i,\sigma_i)$