## Archive for arXiv

## neural summaries

Posted in Statistics, University life with tags arXiv, automated document analysis, language processing, linguistics, neuronal network, paper, score function on September 27, 2019 by xi'an## MCMC importance samplers for intractable likelihoods

Posted in Books, pictures, Statistics with tags ABC, ABC-MCMC, approximate likelihood, arXiv, delayed acceptance, Finland, hidden Markov models, importance sampling, MCMC, PhD thesis, reversibility, University of Jyväskylä on May 3, 2019 by xi'an**J**ordan Franks just posted on arXiv his PhD dissertation at the University of Jyväskylä, where he discuses several of his works:

- M. Vihola, J. Helske, and J. Franks. Importance sampling type estimators based on approximate marginal MCMC. Preprint arXiv:1609.02541v5, 2016.
- J. Franks and M. Vihola. Importance sampling correction versus standard averages of reversible MCMCs in terms of the asymptotic variance. Preprint arXiv:1706.09873v4, 2017.
- J. Franks, A. Jasra, K. J. H. Law and M. Vihola.Unbiased inference for discretely observed hidden Markov model diffusions. Preprint arXiv:1807.10259v4, 2018.
- M. Vihola and J. Franks. On the use of ABC-MCMC with inflated tolerance and post-correction. Preprint arXiv:1902.00412, 2019

focusing on accelerated approximate MCMC (in the sense of pseudo-marginal MCMC) and delayed acceptance (as in our recently accepted paper). Comparing delayed acceptance with MCMC importance sampling to the advantage of the later. And discussing the choice of the tolerance sequence for ABC-MCMC. (Although I did not get from the thesis itself the target of the improvement discussed.)

## automatic adaptation of MCMC algorithms

Posted in pictures, Statistics with tags adaptive MCMC methods, arXiv, asynchronous algorithms, calibration, convergence of Gibbs samplers, Gibbs sampling, MCMC, parallelisation on March 4, 2019 by xi'an

“A typical adaptive MCMC sampler will approximately optimize performance given the kind of sampler chosen in the first place, but it will not optimize among the variety of samplers that could have been chosen.”

**L**ast February (2018), Dao Nguyen and five co-authors arXived a paper that I missed. On a new version of adaptive MCMC that aims at selecting a wider range of proposal kernels. Still requiring a by-hand selection of this collection of kernels… Among the points addressed, beyond the theoretical guarantees that the adaptive scheme does not jeopardize convergence to the proper target, are a meta-exploration of the set of combinations of samplers and integration of the computational speed in the assessment of each sampler. Including the very difficulty of assessing mixing. One could deem the index of the proposal as an extra (cyber-)parameter to its generic parameter (like the scale in the random walk), but the discreteness of this index makes the extension more delicate than expected. And justifies the distinction between internal and external parameters. The notion of a worst-mixing dimension is quite appealing and connects with the long-term hope that one could spend the maximum fraction of the sampler runtime over the directions that are poorly mixing, while still keeping the target as should be. The adaptive scheme is illustrated on several realistic models with rather convincing gains in efficiency and time.

The convergence tools are inspired from Roberts and Rosenthal (2007), with an assumption of uniform ergodicity over all kernels considered therein which is both strong and delicate to assess in practical settings. Efficiency is rather unfortunately defined in terms of effective sample size, which is a measure of correlation or lack thereof, but which does not relate to the speed of escape from the basin of attraction of the starting point. I also wonder at the pertinence of the estimation of the effective sample size when the chain is based on different successive kernels, since the lack of correlation may be due to another kernel. Another calibration issue is the internal clock that relates to the average number of iterations required to tune properly a specific kernel, which again may be difficult to assess in a realistic situation. A last query is whether or not this scheme could be compared with an asynchronous (and valid) MCMC approach that exploits parallel capacities of the computer.

## rethinking the ESS

Posted in Statistics with tags arXiv, delta method, effective sample size, efficiency measures, efficient importance sampling, ESS, importance sampling, MCMC, Monte Carlo Statistical Methods, simulation on September 14, 2018 by xi'an**F**ollowing Victor Elvira‘s visit to Dauphine, one and a half year ago, where we discussed the many defects of ESS as a default measure of efficiency for importance sampling estimators, and then some more efforts (mostly from Victor!) to formalise these criticisms, Victor, Luca Martino and I wrote a paper on this notion, now arXived. (Victor most kindly attributes the origin of the paper to a 2010 ‘Og post on the topic!) The starting thread of the (re?)analysis of this tool introduced by Kong (1992) is that the ESS used in the literature is an *approximation* to the “true” ESS, generally unavailable. Approximation that is pretty crude and hence impacts the relevance of using it as *the* assessment tool for comparing importance sampling methods. In the paper, we re-derive (with the uttermost precision) the resulting approximation and list the many assumptions that [would] validate this approximation. The resulting drawbacks are many, from the absurd property of always being worse than direct sampling, to being independent from the target function and from the sample *per se*. Since only importance weights matter. This list of issues is not exactly brand new, but we think it is worth signaling given the fact that this approximation has been widely used in the last 25 years, due to its simplicity, as a practical rule of thumb [!] in a wide variety of importance sampling methods. In continuation of the directions drafted in Martino et al. (2017), we also indicate some alternative notions of importance efficiency. Note that this paper does not cover the use of ESS for MCMC algorithms, where it is somewhat more legit, if still too rudimentary to really catch convergence or lack thereof! *[Note: I refrained from the post title resinking the ESS…]*

## coordinate sampler as a non-reversible Gibbs-like MCMC sampler

Posted in Books, Kids, Statistics, University life with tags arXiv, Cox process, MCqMC 2018, NIPS 2018, PDMP, PhD students, Rennes, Université Paris Dauphine, Zig-Zag on September 12, 2018 by xi'an**I**n connection with the talk I gave last July in Rennes for MCqMC 2018, I posted yesterday a preprint on arXiv of the work that my [soon to defend!] Dauphine PhD student Changye Wu and I did on an alternative PDMP. In this novel avatar of the zig-zag sampler, a non-reversible, continuous-time MCMC sampler, that we called the Coordinate Sampler, based on a piecewise deterministic Markov process. In addition to establishing the theoretical validity of this new sampling algorithm, we show in the same line as Deligiannidis et al. (2018) that the Markov chain it induces exhibits geometrical ergodicity for distributions which tails decay at least as fast as an exponential distribution and at most as fast as a Gaussian distribution. A few numerical examples (a 2D banana shaped distribution à la Haario et al., 1999, strongly correlated high-dimensional normals, a log-Gaussian Cox process) highlight that our coordinate sampler is more efficient than the zig-zag sampler, in terms of effective sample size.Actually, we had sent this paper before the summer as a NIPS [2018] submission, but it did not make it through [the 4900 submissions this year and] the final review process, being eventually rated above the acceptance bar but not that above!

## troubling trends in machine learning

Posted in Books, pictures, Running, Statistics, University life with tags academic research, arXiv, Coventry, Crayfield Grange, ICML, Kenilworth, machine learning, mathiness, NIPS, PCI Evol Biol, proceedings, sunrise, University of Warwick, Warwickshire on July 25, 2018 by xi'an**T**his morning, in Coventry, while having an n-th cup of tea after a very early morning run (light comes early at this time of the year!), I spotted an intriguing title in the arXivals of the day, by Zachary Lipton and Jacob Steinhard. Addressing the academic shortcomings of machine learning papers. While I first thought little of the attempt to address poor scholarship in the machine learning literature, I read it with growing interest and, although I am pessimistic at the chances of inverting the trend, considering the relentless pace and massive production of the community, I consider the exercise worth conducting, if only to launch a debate on the excesses found in the literature.

“…desirable characteristics: (i) provide intuition to aid the reader’s understanding, but clearly distinguish it from stronger conclusions supported by evidence; (ii) describe empirical investigations that consider and rule out alternative hypotheses; (iii) make clear the relationship between theoretical analysis and intuitive or empirical claims; and (iv) use language to empower the reader, choosing terminology to avoid misleading or unproven connotations, collisions with other definitions, or conflation with other related but distinct concepts”

The points made by the authors are (p.1)

*Failure to distinguish between explanation and speculation**Failure to identify the sources of empirical gains**Mathiness**Misuse of language*

Again, I had misgiving about point 3., but this is not an anti-maths argument, rather about the recourse to vaguely connected or oversold mathematical results as a way to support a method.

Most interestingly (and living dangerously!), the authors select specific papers to illustrate their point, picking from well-established authors *and from their own papers*, rather than from junior authors. And also include counter-examples of papers going the(ir) right way. Among the recommendations for emerging from the morass of poor scholarship papers, they suggest favouring critical writing and retrospective surveys (provided authors can be found for these!). And mention open reviews before I can mention these myself. One would think that published anonymous reviews are a step in the right direction, I would actually say that this should be the norm (plus or minus anonymity) for all journals or successors of journals (PCis coming strongly to mind). But requiring more work from the referees implies rewards for said referees, as done in some biology and hydrology journals I refereed for (and PCIs of course).