## the Hyvärinen score is back

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on November 21, 2017 by xi'an

Stéphane Shao, Pierre Jacob and co-authors from Harvard have just posted on arXiv a new paper on Bayesian model comparison using the Hyvärinen score

$\mathcal{H}(y, p) = 2\Delta_y \log p(y) + ||\nabla_y \log p(y)||^2$

which thus uses the Laplacian as a natural and normalisation-free penalisation for the score test. (Score that I first met in Padova, a few weeks before moving from X to IX.) Which brings a decision-theoretic alternative to the Bayes factor and which delivers a coherent answer when using improper priors. Thus a very appealing proposal in my (biased) opinion! The paper is mostly computational in that it proposes SMC and SMC² solutions to handle the estimation of the Hyvärinen score for models with tractable likelihoods and tractable completed likelihoods, respectively. (Reminding me that Pierre worked on SMC² algorithms quite early during his Ph.D. thesis.)

A most interesting remark in the paper is to recall that the Hyvärinen score associated with a generic model on a series must be the prequential (predictive) version

$\mathcal{H}_T (M) = \sum_{t=1}^T \mathcal{H}(y_t; p_M(dy_t|y_{1:(t-1)}))$

rather than the version on the joint marginal density of the whole series. (Followed by a remark within the remark that the logarithm scoring rule does not make for this distinction. And I had to write down the cascading representation

$\log p(y_{1:T})=\sum_{t=1}^T \log p(y_t|y_{1:t-1})$

to convince myself that this unnatural decomposition, where the posterior on θ varies on each terms, is true!) For consistency reasons.

This prequential decomposition is however a plus in terms of computation when resorting to sequential Monte Carlo. Since each time step produces an evaluation of the associated marginal. In the case of state space models, another decomposition of the authors, based on measurement densities and partial conditional expectations of the latent states allows for another (SMC²) approximation. The paper also establishes that for non-nested models, the Hyvärinen score as a model selection tool asymptotically selects the closest model to the data generating process. For the divergence induced by the score. Even for state-space models, under some technical assumptions.  From this asymptotic perspective, the paper exhibits an example where the Bayes factor and the Hyvärinen factor disagree, even asymptotically in the number of observations, about which mis-specified model to select. And last but not least the authors propose and assess a discrete alternative relying on finite differences instead of derivatives. Which remains a proper scoring rule.

I am quite excited by this work (call me biased!) and I hope it can induce following works as a viable alternative to Bayes factors, if only for being more robust to the [unspecified] impact of the prior tails. As in the above picture where some realisations of the SMC² output and of the sequential decision process see the wrong model being almost acceptable for quite a long while…

Posted in Books, pictures, Travel with tags , , , , , , , , , , , , , on November 19, 2017 by xi'an

“How does it do this? Pears, not traditionally a science fiction writer, employs some commonly used devices of the genre to create a mind-bending but wholly satisfying tale…” Robin’s Books

“Indeed, Arcadia seems to be aimed at the lucrative crossover point between the grownup and YA markets, even if it lacks the antic density of the Harry Potter series or the focused peril of The Hunger Games.” Steven Poole, The Guardian

The picture above is completely unrelated with the book if not the title. (And be at rest: I am not going to start an otter theme in the spirit of Andrew’s cats… Actually a cat plays a significant role in this book.) But Pears’ Arcadia is a fairly boring tale and an attempt at a rather dry play on the over-exploited theme of time-travel. Yaaawny, indeed!

I am fairly disappointed by this book, the more because Pears’ An Instance at the Fingerpost is a superb book, one of my favourites!, with a complexity of threads and levels, while maintaining a coherence of the plot that makes the final revelation a masterpiece. The Dream of Scipio also covers several historical periods of French Provence with a satisfactory plot and deep enough background (fed by a deep knowledge of the area and the eras…). The background, the broader perspective, the deep humanity of the characters, all these qualities of Pears’ books are lost in Arcadia, which sums up as an accumulation of clichés on dystopias, time-travel, and late 1950’s Oxford academics. [Warning, spoilers ahoy!] The parallel (and broadly medieval) universe to which the 20th century characters time-travel has some justifications for being a new type of Flatland: it is the creation of a single Oxonian academic, a mix of J.R. Tolkien and Eric Ambler. But these 20th century characters are equally charicaturesque. And so are the oppressors and the rebels in the distant future. (Set on the Isle of Mull, of all places!) And the mathematics of the time-travel apparatus are carefully kept hidden (with the vague psychomathematics there reminding me of the carefully constructed Asimov’s psychohistory.)

There is a point after which pastiches get stale and unattractive. And boring, so Yawn again. (That the book came to be shortlisted for the Arthur C. Clarke award this year is a mystery.)

## art brut

Posted in pictures, Running, Travel with tags , , , , on November 11, 2017 by xi'an

## a new paradigm for improper priors

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on November 6, 2017 by xi'an

Gunnar Taraldsen and co-authors have arXived a short note on using improper priors from a new perspective. Generalising an earlier 2016 paper in JSPI on the same topic. Which both relate to a concept introduced by Rényi (who himself attributes the idea to Kolmogorov). Namely that random variables measures are to be associated with arbitrary measures [not necessarily σ-finite measures, the later defining σ-finite random variables], rather than those with total mass one. Which allows for an alternate notion of conditional probability in the case of σ-finite random variables, with the perk that this conditional probability distribution is itself of mass 1 (a.e.).  Which we know happens when moving from prior to proper posterior.

I remain puzzled by the 2016 paper though as I do not follow the meaning of a random variable associated with an infinite mass probability measure. If the point is limited to construct posterior probability distributions associated with improper priors, there is little value in doing so. The argument in the 2016 paper is however that one can then define a conditional distribution in marginalisation paradoxes à la Stone, Dawid and Zidek (1973) where the marginal does not exist. Solving with this formalism the said marginalisation paradoxes as conditional distributions are only defined for σ-finite random variables. Which gives a fairly different conclusion that either Stone, Dawid and Zidek (1973) [with whom I agree, namely that there is no paradox because there is no “joint” distribution] or Jaynes (1973) [with whom I less agree!, in that the use of an invariant measure to make the discrepancy go away is not a particularly strong argument in favour of this measure]. The 2016 paper also draws an interesting connection with the study by Jim Hobert and George Casella (in Jim’s thesis) of [null recurrent or transient] Gibbs samplers with no joint [proper] distribution. Which in some situations can produce proper subchains, a phenomenon later exhibited by Alan Gelfand and Sujit Sahu (and Xiao-Li Meng as well if I correctly remember!). But I see no advantage in following this formalism, as it does not impact whether the chain is transient or null recurrent, or anything connected with its implementation. Plus a link to the approximation of improper priors by sequences of proper ones by Bioche and Druihlet I discussed a while ago.

## over the Bernese Alps [jatp]

Posted in Mountains, pictures, Travel with tags , , , , , , , , , on November 4, 2017 by xi'an

Great views of the Bernese and Grisonese Alps on both legs of my trip to and from Venezia. Flying over Les Diablerets, Bormio and many other places I visited over the years..

## fiducial inference

Posted in Books, Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , on October 30, 2017 by xi'an

In connection with my recent tale of the many ε’s, I received from Gunnar Taraldsen [from Tronheim, Norge] a paper [jointly written with Bo Lindqvist and just appeared on-line in JSPI] on conditional fiducial models.

“The role of the prior and the statistical model in Bayesian analysis is replaced by the use of the fiducial model x=R(θ,ε) in fiducial inference. The fiducial is obtained in this case without a prior distribution for the parameter.”

Reading this paper after addressing the X validated question made me understood better the fundamental wrongness of fiducial analysis! If I may herein object to Fisher himself… Indeed, when writing x=R(θ,ε), as the representation of the [observed] random variable x as a deterministic transform of a parameter θ and of an [unobserved] random factor ε, the two random variables x and ε are based on the same random preimage ω, i.e., x=x(ω) and ε=ε(ω). Observing x hence sets a massive constraint on the preimage ω and on the conditional distribution of ε=ε(ω). When the fiducial inference incorporates another level of randomness via an independent random variable ε’ and inverts x=R(θ,ε’) into θ=θ(x,ε’), assuming there is only one solution to the inversion, it modifies the nature of the underlying σ-algebra into something that is incompatible with the original model. Because of this sudden duplication of the random variates. While the inversion of this equation x=R(θ,ε’) gives an idea of the possible values of θ when ε varies according to its [prior] distribution, it does not account for the connection between x and ε. And does not turn the original parameter into a random variable with an implicit prior distribution.

As to conditional fiducial distributions, they are defined by inversion of x=R(θ,ε), under a certain constraint on θ, like C(θ)=0, which immediately raises a Pavlovian reaction in me, namely that since the curve C(θ)=0 has measure zero under the original fiducial distribution, how can this conditional solution be uniquely or at all defined. Or to avoid the Borel paradox mentioned in the paper. If I get the meaning of the authors in this section, the resulting fiducial distribution will actually depend on the choice of σ-algebra governing the projection.

“A further advantage of the fiducial approach in the case of a simple fiducial model is that independent samples are produced directly from independent sampling from [the fiducial distribution]. Bayesian simulations most often come as dependent samples from a Markov chain.”

This side argument in “favour” of the fiducial approach is most curious as it brings into the picture computational aspects that do not have any reason to be there. (The core of the paper is concerned with the unicity of the fiducial distribution in some univariate settings. Not with computational issues.)

## art brut [jatp]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , on October 28, 2017 by xi'an