## a simulated annealing approach to Bayesian inference

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on October 1, 2015 by xi'an

A misleading title if any! Carlos Albert arXived a paper with this title this morning and I rushed to read it. Because it sounded like Bayesian analysis could be expressed as a special form of simulated annealing. But it happens to be a rather technical sequel [“that complies with physics standards”] to another paper I had missed, A simulated annealing approach to ABC, by Carlos Albert, Hans Künsch, and Andreas Scheidegger. Paper that appeared in Statistics and Computing last year, and which is most interesting!

“These update steps are associated with a flow of entropy from the system (the ensemble of particles in the product space of parameters and outputs) to the environment. Part of this flow is due to the decrease of entropy in the system when it transforms from the prior to the posterior state and constitutes the well-invested part of computation. Since the process happens in finite time, inevitably, additional entropy is produced. This entropy production is used as a measure of the wasted computation and minimized, as previously suggested for adaptive simulated annealing” (p.3)

The notion behind this simulated annealing intrusion into the ABC world is that the choice of the tolerance can be adapted along iterations according to a simulated annealing schedule. Both papers make use of thermodynamics notions that are completely foreign to me, like endoreversibility, but aim at minimising the “entropy production of the system, which is a measure for the waste of computation”. The central innovation is to introduce an augmented target on (θ,x) that is

f(x|θ)π(θ)exp{-ρ(x,y)/ε},

where ε is the tolerance, while ρ(x,y) is a measure of distance to the actual observations, and to treat ε as an annealing temperature. In an ABC-MCMC implementation, the acceptance probability of a random walk proposal (θ’,x’) is then

exp{ρ(x,y)/ε-ρ(x’,y)/ε}∧1.

Under some regularity constraints, the sequence of targets converges to

π(θ|y)exp{-ρ(x,y)},

if ε decreases slowly enough to zero. While the representation of ABC-MCMC through kernels other than the Heaviside function can be found in the earlier ABC literature, the embedding of tolerance updating within the modern theory of simulated annealing is rather exciting.

Furthermore, we will present an adaptive schedule that attempts convergence to the correct posterior while minimizing the required simulations from the likelihood. Both the jump distribution in parameter space and the tolerance are adapted using mean fields of the ensemble.” (p.2)

What I cannot infer from a rather quick perusal of the papers is whether or not the implementation gets into the way of the all-inclusive theory. For instance, how can the Markov chain keep moving as the tolerance gets to zero? Even with a particle population and a sequential Monte Carlo implementation, it is unclear why the proposal scale factor [as in equation (34)] does not collapse to zero in order to ensure a non-zero acceptance rate. In the published paper, the authors used the same toy mixture example as ours [from Sisson et al., 2007], where we earned the award of the “incredibly ugly squalid picture”, with improvements in the effective sample size, but this remains a toy example. (Hopefully a post to be continued in more depth…)

## Le Monde puzzle [#929]

Posted in Books, Kids, R with tags , on September 29, 2015 by xi'an

A combinatorics Le Monde mathematical puzzle:

In the set {1,…,12}, numbers adjacent to i are called friends of i. How many distinct subsets of size 5 can be chosen under the constraint that each number in the subset has at least a friend with him?

In a brute force approach, I tried a quintuple loop to check all possible cases:

case=0
for (a in 1:(12-4))
for (b in (a+1):(12-3))
for (c in (b+1):(12-2))
for (d in (c+1):(12-1))
for (e in (d+1):12)
case=case+((b-a<2)&(min(c-b,d-c)<2)
&(min(d-c,e-d)<2)&(e-d<2))


which returns 64 possible cases. Note that the second and last loop are useless since b=a+1 and e=d+1, necessarily. And c is either (b+1) or (d-1), which means 2 choices for c, except when e=a+4. This all adds up to

$8 + 2\sum_{a=1}^7\sum_{e=a+5}^{12} = 8+2.7.8-2.7.8/2=8.8=64$

A related R question: is there a generic way of programming a sequence of embedded loops like the one above without listing all of the loops one by one?

## Salman Rushdie at the Banff Centre

Posted in Books, Kids, Mountains, pictures, Travel with tags , , , , , , , , , on September 28, 2015 by xi'an

By a great coincidence, I happened to be in Banff the same weekend as Salman Rushdie was giving a talk at the Banff Centre on his latest book! And got the news early enough to book a seat. The amphitheatre was unsurprisingly full and Salman Rushdie was interviewed by Eleanor Wachtel, first about the book and what led to its creation, especially the influence of his parents, and then second about his life and career, with an obvious focus on Khomeini’s fatwa. (The whole interview is podcasted on CBC.) Rushdie was witty and funny, even about the darkest moments, and discussed how in his youth no one would have imagined that religion would become such a central issue, defining and reducing people rather than being a part of them that would need no discussion. And how his family was de facto atheist, if not in words. The interview spent too little time on Rusdhie’s stand on freedom of expression, although he briefly spoke about the growing threats to this freedom, including those made in the name of religious freedom. (As we were reminded yesterday by the Warwick student union decision to bar Maryam Namazie from speaking on campus.) The experience was quite a treat, adding to the many bonuses of spending this weekend in Banff. Although I must admit I was fighting jetlag that late at night and hence must have dozed at points… (As an aside, I was rather surprised to see no security or police around the Banff Centre theatre, but of course this does not mean there was none. And this is Banff, not New York City or London.)

## where on [Middle] Earth can a book be banned for moral arguments?

Posted in Books, Kids, Mountains, Travel with tags , , , , on September 27, 2015 by xi'an

Well, the clue in the title should be obvious enough: A censorship board in New Zealand, the Film and Literature Board of Review, has just banned Ted Dawes’ “Into the River” from being sold or distributed or even exhibited. Following complaints orchestrated by Family First, a conservative organisation… The ban is actually temporary, until a committee reaches a decision about the classification of the book. The most surprising aspect of this story—besides the existence of a censoring institution in a democratic country in 2015!— is that the ban applies to everyone, including adult readers, and that all libraries had to take the book out of their shelves. Which is also fairly ridiculous in the era of e-books…

## Mathematical underpinnings of Analytics (theory and applications)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on September 25, 2015 by xi'an

“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)

I received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] analytics?! I did not find much of a definition of analytics when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:

“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”

Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.

“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)

First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum… Continue reading

## mixtures, Heremite polynomials, and ideals

Posted in Books, Kids, Statistics, University life with tags , , , , on September 24, 2015 by xi'an

A 3 page note that got arXived today is [University of Colorado?!] Andrew Clark’s “Expanding the Computation of Mixture Models by the use of Hermite Polynomials and Ideals“. With a typo on Hermite‘s name in the pdf title. The whole point of the note is to demonstrate that mixtures of different types of distributions (like t and Gaussian) are manageable.  A truly stupendous result… As if no one had ever mixed different distributions before.

“Using Hermite polynomials and computing ideals allows the investigator to mix distributions from distinct families.”

The second point of the paper is to derive the mixture weights from an algebraic equation based on the Hermite polynomials of the components, which implies that the components and the mixture distribution itself are already known. Which thus does not seem particularly relevant for mixture estimation…

## running out of explanations

Posted in Books, Kids, Statistics with tags , , , , , on September 23, 2015 by xi'an

A few days ago, I answered a self-study question on Cross Validated about the convergence in probability of 1/X given the convergence in probability of X to a. Until I ran out of explanations… I did not see how to detail any further the connection between both properties! The reader (OP) started from a resolution of the corresponding exercise in Casella and Berger’s Statistical Inference and could not follow the steps, some of which were incorrect. But my attempts at making him uncover the necessary steps failed, presumably because he was sticking to this earlier resolution rather than starting from the definition of convergence in probability. And he could not get over the equality

$\mathbb{P}(|a/X_{i} - 1| < \epsilon)=\mathbb{P}\left(a-{{a\epsilon}\over{1 + \epsilon}} < X_{i} < a + {{a\epsilon}\over{1 - \epsilon}}\right)$

which is the central reason why one convergence transfers to the other… I know I know nothing, and even less about pedagogy, but it is (just so mildly!) frustrating to hit a wall beyond which no further explanation can help! Feel free to propose an alternative resolution.

Update: A few days later, readers of Cross Validated pointed out that the question had been answered by whuber in a magisterial way. But I wonder if my original reader appreciated this resolution, since he did not pursue the issue.