This morning, Clara Grazian and I arXived a paper about Jeffreys priors for mixtures. This is a part of Clara’s PhD dissertation between Roma and Paris, on which she has worked for the past year. Jeffreys priors cannot be computed analytically for mixtures, which is such a drag that it led us to devise the delayed acceptance algorithm. However, the main message from this detailed study of Jeffreys priors is that they mostly do not work for Gaussian mixture models, in that the posterior is almost invariably improper! This is a definite death knell for Jeffreys priors in this setting, meaning that alternative reference priors, like the one we advocated with Kerrie Mengersen and Mike Titterington, or the similar solution in Roeder and Wasserman, have to be used. [Disclaimer: the title has little to do with the paper, except that posterior means are off for mixtures…]
Archive for Roma
Clara Grazian and Brunero Liseo (di Roma) have just arXived a note on a method merging copulas, ABC, and empirical likelihood. The approach is rather hybrid and thus not completely Bayesian, but this must be seen as a consequence of an ill-posed problem. Indeed, as in many econometric models, the model there is not fully defined: the marginals of iid observations are represented as being from well-known parametric families (and are thus well-estimated by Bayesian tools), while the joint distribution remains uncertain and hence so does the associated copula. The approach in the paper is to proceed stepwise, i.e., to estimate correctly each marginal, well correctly enough to transform the data by an estimated cdf, and then only to estimate the copula or some aspect of it based on this transformed data. Like Spearman’s ρ. For which an empirical likelihood is computed and aggregated to a prior to make a BCel weight. (If this sounds unclear, each BEel evaluation is based on a random draw from the posterior samples, which transfers some uncertainty in the parameter evaluation into the copula domain. Thanks to Brunero and Clara for clarifying this point for me!)
At this stage of the note, there are two illustrations revolving around Spearman’s ρ. One on simulated data, with better performances than a nonparametric frequentist solution. And another one on a Garch (1,1) model for two financial time-series.
I am quite glad to see an application of our BCel approach in another domain although I feel a tiny bit uncertain about the degree of arbitrariness in the approach, from the estimated cdf transforms of the marginals to the choice of the moment equations identifying the parameter of interest like Spearman’s ρ. Especially if one uses a parametric copula which moments are equally well-known. While I see the practical gain in analysing each component separately, the object created by the estimated cdf transforms may have a very different correlation structure from the true cdf transforms. Maybe there exist consistency conditions on the estimated cdfs… Maybe other notions of orthogonality or independence could be brought into the picture to validate further the two-step solution…
As if a thumb was not enough, I lost the “new” Canon Ixus 115 H5 I bought in replacement of the (mediocre) Nikon Coolpix I lost on Ben Nevis (the title refer to the miracle mentioned in a post in February 2013, when I almost lost my (Nikon Coolpix L26) camera to the cloaca maxima, in Roma). This happened in the park on Sunday morning when I took it in my raincoat pocket to capture the serene heron standing guard at the end of the grand canal… The camera somehow fell from my pocket without me realising it (of course), presumably falling on soft ground and I only discovered it had happened five or six minutes later, when I stood next to the heron. I retraced my steps back but, even at 7:30 a Sunday morning, there was enough traffic for a runner to find it before me. (Maybe he had no gift ready for mother day!) It was not such a great camera and on its trip to Chamonix last X’mas with my daughter it had decided to host a small fungus that lived right on the lens, making zooming close to impossible. (The same thing had happened with the Nikon Coolpix the year before after falling in the snow during my X’mas ski trip.) Just a wee (bit ?) annoying… (Latest picture from the Canon Ixus to come on Sunday!)
Just a few more words written on my return home from Roma (on an uneventful and sunny trip). On a personal side (that readers can skip!), it was a pleasure as always to be two days in Roma, from running in the early morning, beating the rain and the traffic (and with no map nor camera!) and finding new routes, one on the banks of Tevere and another one along the city walls, to meeting old friends over a plate of pasta, to buying fresh bread and market fruits in the early morning, to enjoying the beauty of La Città Eterna… Too short a trip obviously, but this was/is a busy week!
On the academic side, as mentioned yesterday, the program was quite in tune with my lines of research on ABC and I though a (wee) bit harder about the solutions proposed by the various speakers. One clear tendency is the idea of borrowing from pseudo- or simplified models, either to build estimates (as in indirect inference) or to run the simulation and the calibration (as in, e.g., Olli’s work). As remarked by Judith Rousseau after my talk, we may even have to move further when facing complex models, namely when the simulation of pseudo-samples gets too overwhelming. The issue is then on keeping a connection with the “true” model.
Another theme that crossed several talks is the tension between particle methods (incl. pMCMC) and ABC methods in dynamical models since they usually both apply in such cases. Darren Wilkinson’s talk (that I alas missed in order to catch my plane but recovered by email at the airport, soon to be on-line) did address this opposition, concluding in favour of ABC for the Lotka-Volterra system… My vague feeling is that ABC solutions could indeed come above when they do not rely on the hidden (Markov) structure, in the sense that they do not aim at simulating a joint distribution involving this latent structure…
Here are comments by Olli following my post:
I think we found a general means to obtain accurate ABC in the sense of matching the posterior mean or MAP exactly, and then minimising the KL distance between the true posterior and its ABC approximation subject to this condition. The construction works on an auxiliary probability space, much like indirect inference. Now, we construct this probability space empirically, this is where our approach differs first from indirect inference and this is where we need the “summary values” (>1 data points on a summary level; see Figure 1 for clarification). Without replication, we cannot model the distribution of summary values but doing so is essential to construct this space. Now, lets focus on the auxiliary space. We can fiddle with the tolerances (on a population level) and m so that on this space, the ABC approximation has the aforesaid properties. All the heavy technical work is in this part. Intuitively, as m increases, the power increases for sufficiently regular tests (see Figure 2) and consequently, for calibrated tolerances, the ABC approximation on the auxiliary space goes tighter. This offsets the broadening effect of the tolerances, so having non-identical lower and upper tolerances is fine and does not hurt the approximation. Now, we need to transport the close-to-exact ABC approximation on the auxiliary space back to the original space. We need some assumptions here, and given our time series example, it seems these are not unreasonable. We can reconstruct the link between the auxiliary space and the original parameter space as we accept/reject. This helps us understand (with the videos!) the behaviour of the transformation and to judge if its properties satisfy the assumptions of Theorems 2-4. While we offer some tools to understand the behaviour of the link function, yes, we think more work could be done here to improve on our first attempt to accurate ABC.
Now some more specific comments:
“The paper also insists over and over on sufficiency, which I fear is a lost cause.” To clarify, all we say is that on the simple auxiliary space, sufficient summaries are easily found. For example, if the summary values are normally distributed, the sample mean and the sample variance are sufficient statistics. Of course, this is not the original parameter space and we only transform the sufficiency problem into a change of variable problem. This is why we think that inspecting and understanding the link function is important.
“Another worry is that the … test(s) rel(y) on an elaborate calibration”. We provide some code here for everyone to try out. In our examples, this did not slow down ABC considerably. We generally suppose that the distribution of the summary values is simple, like Gaussian, Exponential, Gamma, ChiSquare, Lognormal. In these cases, the ABC approximation takes on an easy-enough-to-calibrate-fast functional form on the auxiliary space.
“This Theorem 3 sounds fantastic but makes me uneasy: unbiasedness is a sparse property that is rarely found in statistical problems. … Witness the use of “essentially unbiased” in Fig. 4.” What Theorem 3 says is that if unbiasedness can be achieved on the simple auxiliary space, then there are regularity conditions under which these properties can be transported back to the original parameter space. We hope to illustrate these conditions with our examples, and to show that they hold in quite general cases such as the time series application. The thing in Figure 4 is that the sample autocorrelation is not an unbiased estimator of the population autocorrelation. So unbiasedness does not quite hold on the auxiliary space and the conditions of Theorem 3 are not satisfied. Nevertheless, we found this bias to be rather negligible in our example and the bigger concern was the effect of the link function.
And here are Olli’s slides:
Back in Roma after my ABC week last year, for the ABC in Rome workshop! The attendance is quite in par with the sizes of the previous audiences and the program is close to my own interests—unsurprisingly since I took part to the scientific committee! Hence talks on papers that have already been discussed on the ‘Og for most of them:
- Dennis Prangle on semi-automatic ABC model choice
- Oliver Ratman on acccurate ABC
- Judith Rousseau on model choice consistency
- Richard Everitt on latent MRFs
- myself (in replacement of Kerrie Mengersen) on (A)BC empirical likelihood
- Gael Martin on unscented Kalman filters for noisy diffusions (on which we worked last summer at Monash)
- Gérard Biau on ABC as knn
- Sarah Filippi on sequential ABC
- Nicolas Chopin on EP-ABC
- Daniel Wegman on speeding up ABC
- Anthony Lee on geometrically ergodic ABC
- Darren Wilkinson on intractable Markov process
(It almost sounds as if I had written the program by myself, but this is not the case, promised!) So from my own personal and egoistic perspective, the poster session was more surprising, with 18 posters ranging from theoretical extensions to applications. I actually wished it had lasted a wee bit longer as I did not have time to listen to all presenters before they vanished to the dinning room upstairs, but I appreciated very much the few exchanges I had. A fully enjoyable meeting then!!! I am definitely looking forward the next edition of ABC in [pick your capital], ABC in Sydney (2014) and ABC in Helsinki (2015) being already in the planning…
Here are my slides, just slightly updated from the previous version: