Archive for the University life Category

JSM 2018 [#4½]

Posted in Statistics, University life with tags , , , , , , , , on August 10, 2018 by xi'an

As I wrote my previous blog entry on JSM2018 before the sessions, I did not have the chance to comment on our mixture session, which I found most interesting!, with new entries on the topic and a great discussion by Bettina Grün. Including the important call for linking weights with the other parameters, as both groups being independent does not make sense when the number of components is uncertain. (Incidentally our paper with Kaniav kamary and Kate Lee does create a dependence.) The talk by Deborah Kunkel was about anchored mixture estimation, a joint work with Mario Peruggia, another arXival that I had missed.

The notion of anchoring found in this paper is to allocate specific observations to specific components. These observations are thus anchored to these components. Among other things, this modification of the sampling model implies a removal of the unidentifiability problem. Hence formally of the label-switching or lack thereof issue. (Although, as Peter Green repeatedly mentioned, visualising the parameter space as a point process eliminates the issue.) This idea is somewhat connected with the constraint Jean Diebolt and I imposed in our 1990 mixture paper, namely that no component would have less than two observations allocated to it, but imposing which ones are which of course reduces drastically the complexity of the model. Another (related) aspect of anchoring is that the observations that are anchored to the components act as parts of the prior model, modifying the initial priors (which can then become improper as in our 1990 paper). The difficulty of the anchoring approach is to find observations to anchor in an unsupervised setting. The paper proceeds by optimising the allocations, which somewhat turns the prior into a data-dependent prior since all observations are used to set the anchors and then used again for the standard Bayesian processing. In that respect, I would rather follow the sequential procedure developed by Nicolas Chopin and Florian Pelgrin, where the number of components grows by steps with the number of observations.


for a coincidence

Posted in Mountains, pictures, Travel, University life with tags , , , , , , on August 5, 2018 by xi'an

Last night in Vancouver, we were walking back to Chinatown under an expressway, in a rather uninspiring section of town. Waiting at a cross-light with another couple on the other side. As we crossed the street I glanced at the man and noticed his Chamonix North Face tee-shirt. He happened to do the same and… noticed my identical Chamonix North Face tee-shirt! We shared a laugh at this (huge?) coincidence and continued on our respective ways. (He was not taking part in MCMskiii in case this seems a likely explanation!)

ICM 2018

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on August 4, 2018 by xi'an

While I am not following the International Congress of Mathematicians which just started in Rio, and even less attending, I noticed an entry on their webpage on my friend and colleague Maria Esteban which I would have liked to repost verbatim but cannot figure how. (ICM 2018 also features a plenary lecture by Michael Jordan on gradient based optimisation [which was also Michael’s topic at ISBA 2018] and another one by Sanjeev Arora on the maths deep learning, two talks broadly related with statistics, which is presumably a première at this highly selective maths conference!)

JSM 2018 [#4]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 3, 2018 by xi'an

As last ½ day of sessions at JSM2018 in an almost deserted conference centre, with a first session set together by Mario Peruggia and a second on Advances in Bayesian Nonparametric Modeling and Computation for Complex Data. Here are the slides of my talk this morning in the Bayesian mixture estimation session.

which I updated last night (Slideshare most absurdly does not let you update versions!)

Since I missed the COPSS Award ceremony for a barbecue with friends on Locarno Beach, I only discovered this morning that the winner this year is Richard Samworth, from Cambridge University, who eminently deserves this recognition, if only because of his contributions to journal editing, as I can attest from my years with JRSS B. Congrats to him as well as to Bin Yu and Susan Murphy for their E.L. Scott and R.A. Fisher Awards!  I also found out from an email to JSM participants that the next edition is in Denver, Colorado, which I visited only once in 1993 on a trip to Fort Collins visiting Kerrie Mengersen and Richard Tweedie. Given the proximity to the Rockies, I am thinking of submitting an invited session on ABC issues, which were not particularly well covered by this edition of JSM. (Feel free to contact me if you are interested in joining the session.)

JSM 2018 [#3]

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on August 2, 2018 by xi'an

Third day at JSM2018 and the audience is already much smaller than the previous days! Although it is hard to tell with a humongous conference centre spread between two buildings. And not getting hooked by the tantalising view of the bay, with waterplanes taking off every few minutes…

Still, there were (too) few participants in the two computational statistics (MCMC) sessions I attended in the morning, the first one being organised by James Flegal on different assessments of MCMC convergence. (Although this small audience made the session quite homely!) In his own talk, James developed an interesting version of multivariate ESS that he related with a stopping rule for minimal precision. Vivek Roy also spoke about a multiple importance sampling construction I missed when it came upon on arXiv last May.

In the second session, Mylène Bédard exposed the construction of and improvement brought by local scaling in MALA, with 20% gain from using non-local tuning. Making me idle muse over whether block sizes in block-Gibbs sampling could also be locally optimised… Then Aaron Smith discussed how HMC should be scaled for optimal performances, under rather idealised conditions and very high dimensions. Mentioning a running time of d, the dimension, to the power ¼. But not addressing the practical question of calibrating scale versus number of steps in the discretised version. (At which time my hands were [sort of] frozen solid thanks to the absurd air conditioning in the conference centre and I had to get out!)

JSM 2018 [#3]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 1, 2018 by xi'an

As I skipped day #2 for climbing, here I am on day #3, attending JSM 2018, with a [fully Canadian!] session on (conditional) copula (where Bruno Rémillard talked of copulas for mixed data, with unknown atoms, which sounded like an impossible target!), and another on four highlights from Bayesian Analysis, (the journal), with Maria Terres defending the (often ill-considered!) spectral approach within Bayesian analysis, modelling spectral densities (Fourier transforms of correlations functions, not probability densities), an advantage compared with MCAR modelling being the automated derivation of dependence graphs. While the spectral ghost did not completely dissipate for me, the use of DIC that she mentioned at the very end seems to call for investigation as I do not know of well-studied cases of complex dependent data with clearly specified DICs. Then Chris Drobandi was speaking of ABC being used for prior choice, an idea I vaguely remember seeing quite a while ago as a referee (or another paper!), paper in BA that I missed (and obviously did not referee). Using the same reference table works (for simple ABC) with different datasets but also different priors. I did not get first the notion that the reference table also produces an evaluation of the marginal distribution but indeed the entire simulation from prior x generative model gives a Monte Carlo representation of the marginal, hence the evidence at the observed data. Borrowing from Evans’ fringe Bayesian approach to model choice by prior predictive check for prior-model conflict. I remain sceptic or at least agnostic on the notion of using data to compare priors. And here on using ABC in tractable settings.

The afternoon session was [a mostly Australian] Advanced Bayesian computational methods,  with Robert Kohn on variational Bayes, with an interesting comparison of (exact) MCMC and (approximative) variational Bayes results for some species intensity and the remark that forecasting may be much more tolerant to the approximation than estimation. Making me wonder at a possibility of assessing VB on the marginals manageable by MCMC. Unless I miss a complexity such that the decomposition is impossible. And Antonietta Mira on estimating time-evolving networks estimated by ABC (which Anto first showed me in Orly airport, waiting for her plane!). With a possibility of a zero distance. Next talk by Nadja Klein on impicit copulas, linked with shrinkage properties I was unaware of, including the case of spike & slab copulas. Michael Smith also spoke of copulas with discrete margins, mentioning a version with continuous latent variables (as I thought could be done during the first session of the day), then moving to variational Bayes which sounds quite popular at JSM 2018. And David Gunawan made a presentation of a paper mixing pseudo-marginal Metropolis with particle Gibbs sampling, written with Chris Carter and Robert Kohn, making me wonder at their feature of using the white noise as an auxiliary variable in the estimation of the likelihood, which is quite clever but seems to get against the validation of the pseudo-marginal principle. (Warning: I have been known to be wrong!)

JSM 2018 [#1]

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , , on July 30, 2018 by xi'an

As our direct flight from Paris landed in the morning in Vancouver,  we found ourselves in the unusual situation of a few hours to kill before accessing our rental and where else better than a general introduction to deep learning in the first round of sessions at JSM2018?! In my humble opinion, or maybe just because it was past midnight in Paris time!, the talk was pretty uninspiring in missing the natural question of the possible connections between the construction of a prediction function and statistics. Watching improving performances at classifying human faces does not tell much more than creating a massively non-linear function in high dimensions with nicely designed error penalties. Most of the talk droned about neural networks and their fitting by back-propagation and the variations on stochastic gradient descent. Not addressing much rather natural (?) questions about choice of functions at each level, of the number of levels, of the penalty term, or regulariser, and even less the reason why no sparsity is imposed on the structure, despite the humongous number of parameters involved. What came close [but not that close] to sparsity is the notion of dropout, which is a sort of purely automated culling of the nodes, and which was new to me. More like a sort of randomisation that turns the optimisation criterion in an average. Only at the end of the presentation more relevant questions emerged, presenting unsupervised learning as density estimation, the pivot being the generative features of (most) statistical models. And GANs of course. But nonetheless missing an explanation as to why models with massive numbers of parameters can be considered in this setting and not in standard statistics. (One slide about deterministic auto-encoders was somewhat puzzling in that it seemed to repeat the “fiducial mistake”.)