## variational approximation to empirical likelihood ABC

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , on October 1, 2021 by xi'an

Sanjay Chaudhuri and his colleagues from Singapore arXived last year a paper on a novel version of empirical likelihood ABC that I hadn’t yet found time to read. This proposal connects with our own, published with Kerrie Mengersen and Pierre Pudlo in 2013 in PNAS. It is presented as an attempt at approximating the posterior distribution based on a vector of (summary) statistics, the variational approximation (or information projection) appearing in the construction of the sampling distribution of the observed summary. (Along with a weird eyed-g symbol! I checked inside the original LaTeX file and it happens to be a mathbbmtt g, that is, the typewriter version of a blackboard computer modern g…) Which writes as an entropic correction of the true posterior distribution (in Theorem 1).

“First, the true log-joint density of the observed summary, the summaries of the i.i.d. replicates and the parameter have to be estimated. Second, we need to estimate the expectation of the above log-joint density with respect to the distribution of the data generating process. Finally, the differential entropy of the data generating density needs to be estimated from the m replicates…”

The density of the observed summary is estimated by empirical likelihood, but I do not understand the reasoning behind the moment condition used in this empirical likelihood. Indeed the moment made of the difference between the observed summaries and the observed ones is zero iff the true value of the parameter is used in the simulation. I also fail to understand the connection with our SAME procedure (Doucet, Godsill & X, 2002), in that the empirical likelihood is based on a sample made of pairs (observed,generated) where the observed part is repeated m times, indeed, but not with the intent of approximating a marginal likelihood estimator… The notion of using the actual data instead of the true expectation (i.e. as a unbiased estimator) at the true parameter value is appealing as it avoids specifying the exact (or analytical) value of this expectation (as in our approach), but I am missing the justification for the extension to any parameter value. Unless one uses an ancillary statistic, which does not sound pertinent… The differential entropy is estimated by a Kozachenko-Leonenko estimator implying k-nearest neighbours.

“The proposed empirical likelihood estimates weights by matching the moments of g(X¹), , g(X) with that of
g(X), without requiring a direct relationship with the parameter. (…) the constraints used in the construction of the empirical likelihood are based on the identity in (7), which can only be satisfied when θ = θ⁰. “

Although I am feeling like missing one argument, the later part of the paper seems to comfort my impression, as quoted above. Meaning that the approximation will fare well only in the vicinity of the true parameter. Which makes it untrustworthy for model choice purposes, I believe. (The paper uses the g-and-k benchmark without exploiting Pierre Jacob’s package that allows for exact MCMC implementation.)

## Nature tidbits

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on September 18, 2018 by xi'an

In the Nature issue of July 19 that I read in the plane to Singapore, there was a whole lot of interesting entries, from various calls expressing deep concern about the anti-scientific stance of the Trump administration, like cutting funds for environmental regulation and restricting freedom of communication (ETA) or naming a non-scientist at the head of NASA and other agencies, or again restricting the protection of species, to a testimony of an Argentinian biologist in front of a congressional committee about the legalisation of abortion (which failed at the level of the Agentinian senate later this month), to a DNA-like version of neural network, to Louis Chen from NUS being mentioned in a career article about the importance of planning well in advance one’s retirement to preserve academia links and manage a new position or even career. Which is what happened to Louis as he stayed head of NUS after the mandatory retirement age and is now emeritus and still engaged into research. (The article made me wonder however how the cases therein had be selected.) It is actually most revealing to see how different countries approach the question of retirements of academics: in France, for instance, one is essentially forced to retire and, while there exist emeritus positions, it is extremely difficult to find funding.

“Louis Chen was technically meant to retire in 2005. The mathematician at the National University of Singapore was turning 65, the university’s official retirement age. But he was only five years into his tenure as director of the university’s new Institute for Mathematical Sciences, and the university wanted him to stay on. So he remained for seven more years, stepping down in 2012. Over the next 18 months, he travelled and had knee surgery, before returning in summer 2014 to teach graduate courses for a year.”

And [yet] another piece on the biases of AIs. Reproducing earlier papers discussed here, with one obvious reason being that the learning corpus is not representative of the whole population, maybe survey sampling should become compulsory in machine learning training degrees. And yet another piece on why protectionism is (also) bad for the environment.

## Le Monde puzzle [#1650]

Posted in Books, Kids, R with tags , , , , , , , , , on September 5, 2018 by xi'an

A penultimate Le Monde mathematical puzzle  before the new competition starts [again!]

For a game opposing 40 players over 12 questions, anyone answering correctly a question gets as reward the number of people who failed to answer. Alice is the single winner: what is her minimal score? In another round, Bob is the only lowest grade: what is his maximum score?

For each player, the score S is the sum δ¹s¹+…+δ⁸s⁸, where the first term is an indicator for a correct answer and the second term is the sum over all other players of their complementary indicator, which can be replaced with the sum over all players since δ¹(1-δ¹)=0. Leading to the vector of scores

```worz <- function(ansz){
scor=apply(1-ansz,2,sum)
return(apply(t(ansz)*scor,2,sum))}
```

Now, running by brute-force a massive number of simulations confirmed my intuition that the minimal winning score is 39, the number of players minus one [achieved by Alice giving a single good answer and the others none at all], while the maximum loosing score appeared to be 34, for which I had much less of an intuition!  I would have rather guessed something in the vicinity of 80 (being half of the answers replied correctly by half of the players)… Indeed, while in SIngapore, I however ran in the wee hours a quick simulated annealing code from this solution and moved to 77.

And the 2018 version of Le Monde maths puzzle competition starts today!, for a total of eight double questions, starting with an optimisation problem where the adjacent X table is filled with zeros and ones, trying to optimise (max and min) the number of positive entries [out of 45] for which an even number of neighbours is equal to one. On the represented configuration, green stands for one (16 ones) and P for the positive entries (31 of them). This should be amenable to a R resolution [R solution], by, once again!, simulated annealing. Deadline for the reply on the competition website is next Tuesday, midnight [UTC+1]

## IMS workshop [day 4]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on August 31, 2018 by xi'an

While I did not repeat the mistake of yesterday morning, just as well because the sun was unbearably strong!, I managed this time to board a bus headed in the wrong direction and as a result went through several remote NUS campi! Missing the first talk of the day as a result. By Youssef Marzouk, with a connection between sequential Monte Carlo and optimal transport. Transport for sampling, that is. The following talk by Tiangang Cui was however related, with Marzouk a co-author, as it aimed at finding linear transforms towards creating Normal approximations to the target to be used as proposals in Metropolis algorithms. Which may sound like something already tried a zillion times in the MCMC literature, except that the setting was rather specific to some inverse problems, imposing a generalised Normal structure on the transform, then optimised by transport arguments. It is unclear to me [from just attending the talk] how complex this derivation is and how dimension steps in, but the produced illustrations were quite robust to an increase in dimension.

The remaining talks for the day were mostly particular, from Anthony Lee introducing a new and almost costless way of producing variance estimates in particle filters, exploiting only the ancestry of particles, to Mike Pitt discussing the correlated pseudo-marginal algorithm developed with George Deligiannidis and Arnaud Doucet. Which somewhat paradoxically managed to fight the degeneracy [i.e., the need for a number of terms increasing like the time index T] found in independent pseudo-marginal resolutions, moving down to almost log(T)… With an interesting connection to the quasi SMC approach of Mathieu and Nicolas. And Sebastian Reich also stressed the links with optimal transport in a talk about data assimilation that was way beyond my reach. The day concluded with fireworks, through a magistral lecture by Professeur Del Moral on a continuous time version of PMCMC using the Feynman-Kac terminology. Pierre did a superb job during his lecture towards leading the whole room to the conclusion.

## IMS workshop [day 3]

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on August 30, 2018 by xi'an

I made the “capital” mistake of walking across the entire NUS campus this morning, which is quite green and pretty, but which almost enjoys an additional dimension brought by such an intense humidity that one feels having to get around this humidity!, a feature I have managed to completely erase from my memory of my previous visit there. Anyway, nothing of any relevance. oNE talk in the morning was by Markus Eisenbach on tools used by physicists to speed up Monte Carlo methods, like the Wang-Landau flat histogram, towards computing the partition function, or the distribution of the energy levels, definitely addressing issues close to my interest, but somewhat beyond my reach for using a different language and stress, as often in physics. (I mean, as often in physics talks I attend.) An idea that came out clear to me was to bypass a (flat) histogram target and aim directly at a constant slope cdf for the energy levels. (But got scared away by the Fourier transforms!)

Lawrence Murray then discussed some features of the Birch probabilistic programming language he is currently developing, especially a fairly fascinating concept of delayed sampling, which connects with locally-optimal proposals and Rao Blackwellisation. Which I plan to get back to later [and hopefully sooner than later!].

In the afternoon, Maria de Iorio gave a talk about the construction of nonparametric priors that create dependence between a sequence of functions, a notion I had not thought of before, with an array of possibilities when using the stick breaking construction of Dirichlet processes.

And Christophe Andrieu gave a very smooth and helpful entry to partly deterministic Markov processes (PDMP) in preparation for talks he is giving next week for the continuation of the workshop at IMS. Starting with the guided random walk of Gustafson (1998), which extended a bit later into the non-reversible paper of Diaconis, Holmes, and Neal (2000). Although I had a vague idea of the contents of these papers, the role of the velocity ν became much clearer. And premonitory of the advances made by the more recent PDMP proposals. There is obviously a continuation with the equally pedagogical talk Christophe gave at MCqMC in Rennes two months [and half the globe] ago,  but the focus being somewhat different, it really felt like a new talk [my short term memory may also play some role in this feeling!, as I now remember the discussion of Hilderbrand (2002) for non-reversible processes]. An introduction to the topic I would recommend to anyone interested in this new branch of Monte Carlo simulation! To be followed by the most recently arXived hypocoercivity paper by Christophe and co-authors.