Archive for Faith

grey sister [book review]

Posted in Statistics with tags , , , , , , , , , , , , on October 21, 2018 by xi'an

Unsurprisingly, as soon I got my hands on the second [hardcover] volume after Red Sister, Grey Sister, I could not resist reading it. Nursing a serious cold, gotten while visiting Warwick wearing only summer gear (!), helped and I thus spent my Sunday reading feverishly through Mark Lawrence’s latest book. As I enjoyed very much the first volume, immersing into the same “boarding school” atmosphere was easy, reuniting with most characters, including some I though had been dead and missing others I had not realised they had been killed (no spoiler, just my imperfect memory!).

“The greatest threat to any faith is not other faiths or beliefs but the corruption and division of its own message”
With this bias inherited from the earlier volume, read four weeks ago, I cannot say I did not enjoy the book. Actually, the first half of Grey Sister is more enjoyable than the first volume because the training of the young novices in the Sweet Mercy monastery gets more focused, with more complex challenges, and less boarding school bickering nonsense. Except for one main thread that weights too much on the plot in my opinion (no spoiler, again, as it is almost obvious from the start that the rivalry between Nona, the main character, and a high born novice is there for a purpose). There is an Ender’s Game moment that I particularly enjoyed, with an Alexander’s resolution of a Gordian knot, which comes to signal the end of the almost peaceful part. I liked very much less the second half, taking place on the run away from the Sweet Mercy monastery, where there are too many coincidences and too many intersections of paths that one wishes the author had gone for this Alexander’s resolution of a Gordian knot himself! I think the plot almost peters out at this stage and only survives by sheer inertia, too many boulders loose at once to all stop at the same time!
“The sky above was a deep maroon, shading towards black, strewn with dark ribbons of cloud that looked like lacerations where jagged peaks tore the heavens.”
The style is sometimes repetitive and sometimes on the heavy side, as the quote above I wish someone has re-read. Despite  the grand (and somewhat nefarious) schemes of Abbess Glass, the story is too homely, which may be why the part “at home” feels more convincing that the part outside. The main villain’s plans for taking power over the whole country and the artificial moon are incredible, unconvincing and definitely sketchy, even when explained in the middle of a royal brawl. However, the continued description of the ice-encased universe, saved from complete freeze by an artificial moon and four nuclear reactors, plus an increasing role of magic, make the background compelling and leave me eager for the final (?) volume in the series.

non-local priors for mixtures

Posted in Statistics, University life with tags , , , , , , , , , , , , , , , on September 15, 2016 by xi'an

[For some unknown reason, this commentary on the paper by Jairo Fúquene, Mark Steel, David Rossell —all colleagues at Warwick— on choosing mixture components by non-local priors remained untouched in my draft box…]

Choosing the number of components in a mixture of (e.g., Gaussian) distributions is a hard problem. It may actually be an altogether impossible problem, even when abstaining from moral judgements on mixtures. I do realise that the components can eventually be identified as the number of observations grows to infinity, as demonstrated foFaith, Barossa Valley wine: strange name for a Shiraz (as it cannot be a mass wine!, but nice flavoursr instance by Judith Rousseau and Kerrie Mengersen (2011). But for a finite and given number of observations, how much can we trust any conclusion about the number of components?! It seems to me that the criticism about the vacuity of point null hypotheses, namely the logical absurdity of trying to differentiate θ=0 from any other value of θ, applies to the estimation or test on the number of components of a mixture. Doubly so, one might argue, since a very small or a very close component is undistinguishable from a non-existing one. For instance, Definition 2 is correct from a mathematical viewpoint, but it does not spell out the multiple contiguities between k and k’ component mixtures.

The paper starts with a comprehensive coverage of l’état de l’art… When using a Bayes factor to compare a k-component and an h-component mixture, the behaviour of the factor is quite different depending on which model is correct. Essentially overfitted mixtures take much longer to detect than underfitted ones, which makes intuitive sense. And BIC should be corrected for overfitted mixtures by a canonical dimension λ between the true and the (larger) assumed number of parameters  into

2 log m(y) = 2 log p(y|θ) – λ log O(n) + O(log log n)

I would argue that this purely invalidates BIG in mixture settings since the canonical dimension λ is unavailable (and DIC does not provide a useful substitute as we illustrated a decade ago…) The criticism about Rousseau and Mengersen (2011) over-fitted mixture that their approach shrinks less than a model averaging over several numbers of components relates to minimaxity and hence sounds both overly technical and reverting to some frequentist approach to testing. Replacing testing with estimating sounds like the right idea.  And I am also unconvinced that a faster rate of convergence of the posterior probability or of the Bayes factor is a relevant factor when conducting

As for non local priors, the notion seems to rely on a specific topology for the parameter space since a k-component mixture can approach a k’-component mixture (when k'<k) in a continuum of ways (even for a given parameterisation). This topology seems to be summarised by the penalty (distance?) d(θ) in the paper. Is there an intrinsic version of d(θ), given the weird parameter space? Like one derived from the Kullback-Leibler distance between the models? The choice of how zero is approached clearly has an impact on how easily the “null” is detected, the more because of the somewhat discontinuous nature of the parameter space. Incidentally, I find it curious that only the distance between means is penalised… The prior also assumes independence between component parameters and component weights, which I think is suboptimal in dealing with mixtures, maybe suboptimal in a poetic sense!, as we discussed in our reparameterisation paper. I am not sure either than the speed the distance converges to zero (in Theorem 1) helps me to understand whether the mixture has too many components for the data’s own good when I can run a calibration experiment under both assumptions.

While I appreciate the derivation of a closed form non-local prior, I wonder at the importance of the result. Is it because this leads to an easier derivation of the posterior probability? I do not see the connection in Section 3, except maybe that the importance weight indeed involves this normalising constant when considering several k’s in parallel. Is there any convergence issue in the importance sampling solution of (3.1) and (3.3) since the simulations are run under the local posterior? While I appreciate the availability of an EM version for deriving the MAP, a fact I became aware of only recently, is it truly bringing an improvement when compared with picking the MCMC simulation with the highest completed posterior?

The section on prior elicitation is obviously of central interest to me! It however seems to be restricted to the derivation of the scale factor g, in the distance, and of the parameter q in the Dirichlet prior on the weights. While the other parameters suffer from being allocated the conjugate-like priors. I would obviously enjoy seeing how this approach proceeds with our non-informative prior(s). In this regard, the illustration section is nice, but one always wonders at the representative nature of the examples and the possible interpretations of real datasets. For instance, when considering that the Old Faithful is more of an HMM than a mixture.