Le Monde puzzle [#1085]

Posted in Books, Kids, R with tags , , , , , on February 18, 2019 by xi'an

A new Le Monde mathematical puzzle in the digit category:

Given 13 arbitrary relative integers chosen by Bo, Abigail can select any subset of them to be drifted by plus or minus one by Bo, repeatedly until Abigail reaches the largest possible number N of multiples of 5. What is the minimal possible value of N under the assumption that Bo tries to minimise it?

I got stuck on that one, as building a recursive functiion led me nowhere: the potential for infinite loop (add one, subtract one, add one, …) rather than memory issues forced me into a finite horizon for the R function, which then did not return anything substantial in a manageable time. Over the week and the swimming sessions, I thought of simplifying the steps, like (a) work modulo 5, (b) bias moves towards 1 or 4, away from 2 and 3, by keeping only one entry in 2 and 3, and all but one at 1 and 4, but could only produce five 0’s upon a sequence of attempts… With the intuition that only 3 entries should remain in the end, which was comforted by Le Monde solution the week after.

research outreach wants to improve my public image [ltd]

Posted in Books, University life with tags , , , , , on February 17, 2019 by xi'an

Dear Dr. Christian Robert,

Please excuse the direct nature of this contact, however I would like to speak with you regarding your work on the Accelerating MCMC algorithms study.

Research Outreach work in collaboration with research teams assisting with their Public Outreach activity, through means of a professionally produced series of publications and articles aimed at a broader audience. I understand Public Outreach is a very important issue within the research community – where it is often difficult to obtain large numbers of views and interactions.

We are currently working on our April 2019 edition. This publication will feature a wide variety of science disciplines, including approximately 28 research articles. We would like you to be one of those featured. My suggestion is that we might create a new article, possibly covering some basic details of the published paper I have seen online, or indeed, covering new ground or the wider scope of your work. This would be an entirely new article written and developed by Research Outreach, in close collaboration with you.

Research Outreach distributes all content, publications and your individual article, across major Social Media platforms, as well as our own website. Your work will be seen by a large and quantifiable global audience. We are a non-profit community company, all content we produce is free to share and can be downloaded by any reader. Our website is entirely transparent and answers any questions you may have.

I have briefly looked over the details of your work and believe Research Outreach to be an effective platform to communicate your outreach requirements. We publish under the Creative Commons Licence and you will own the copyright to your published material, and we can send you a separate PDF download, to be used on your web-page, at events, conferences, or as outreach material for students etc.

Please have a look at our website for examples of the publications we produce and further detail on our objectives and services. I can assure you the entire process requires very little work on your part, less than 1 hour, over a 10-week process. Some of the services we provide are paid for, this means there is a cost to you if you decide to take part. I would be delighted to explain this and our editorial process. I understand your time is limited.

Rather than writing any further detail, could we please find 5 minutes to discuss? During our call, I can answer any questions you may have and explain in detail the requirements and cost for taking part and of course the benefits of doing so.

This sounds rather predatory to me, since I have to pay this company to produce a paper I (co?)authored, and of dubious academic worth if I spend less than one hour on the project. When looking at the issues on-line, the contents of the articles seem quite light, with glossy images not always related with the topic of the journal article. If not predatory, then a sort of advertising agency for academics?! Weird times…

take a random integer

Posted in Books, Statistics with tags , , on February 16, 2019 by xi'an

A weird puzzle from FiveThirtyEight: what is the probability that the product of three random integers is a multiple of 100? Ehrrrr…, what is a random integer?! The solution provided by the Riddler is quite stunning

Reading the question charitably (since “random integer” has no specific meaning), there will be an answer if there is a limit for a uniform distribution of positive integers up to some number . But we can ignore that technicality, and make do with the idealization that since every second, fourth, fifth, and twenty-fifth integer are divisible by and , the chances of getting a random integer divisible by those numbers are , , , and .

as it acknowledges that the question is meaningless, then dismisses this as a “technicality” and still handles a Uniform random integer on {1,2,…,N} as N grows to infinity! Since all that matters is the remainder of the “random variable” modulo 100, this remainder will see its distribution vary as N moves to infinity, even though it indeed stabilises for $N$ large enough…

undecidable learnability

Posted in Books, Statistics, Travel, University life with tags , , , , , , on February 15, 2019 by xi'an

“There is an unknown probability distribution P over some finite subset of the interval [0,1]. We get to see m i.i.d. samples from P for m of our choice. We then need to find a finite subset of [0,1] whose P-measure is at least 2/3. The theorem says that the standard axioms of mathematics cannot be used to prove that we can solve this problem, nor can they be used to prove that we cannot solve this problem.”

In the first issue of the (controversial) nature machine intelligence journal, Ben-David et al. wrote a paper they present a s the machine learning equivalent to Gödel’s incompleteness theorem. The result is somewhat surprising from my layman perspective and it seems to only relate to a formal representation of statistical problems. Formal as in the Vapnik-Chervonenkis (PAC) theory. It sounds like, given a finite learning dataset, there are always features that cannot be learned if the size of the population grows to infinity, but this is hardly exciting…

The above quote actually makes me think of the Robbins-Wasserman counter-example for censored data and Bayesian tail prediction, but I am unsure the connection is anything more than sheer fantasy..!
~

a pen for ABC

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , on February 13, 2019 by xi'an

Among the flury of papers arXived around the ICML 2019 deadline, I read on my way back from Oxford a paper by Wiqvist et al. on learning summary statistics for ABC by neural nets. Pointing out at another recent paper by Jiang et al. (2017, Statistica Sinica) which constructed a neural network for predicting each component of the parameter vector based on the input (raw) data, as an automated non-parametric regression of sorts. Creel (2017) does the same but with summary statistics. The current paper builds up from Jiang et al. (2017), by adding the constraint that exchangeability and partial exchangeability features should be reflected by the neural net prediction function. With applications to Markovian models. Due to a factorisation theorem for d-block invariant models, the authors impose partial exchangeability for order d Markov models by combining two neural networks that end up satisfying this factorisation. The concept is exemplified for one-dimension g-and-k distributions, alpha-stable distributions, both of which are made of independent observations, and the AR(2) and MA(2) models, as in our 2012 ABC survey paper. Since the later is not Markovian the authors experiment with different orders and reach the conclusion that an order of 10 is most appropriate, although this may be impacted by being a ble to handle the true likelihood.

scalable Metropolis-Hastings

Posted in Books, Statistics, Travel with tags , , , , , , , , , on February 12, 2019 by xi'an

Among the flury of arXived papers of last week (414!), including a fair chunk of papers submitted to ICML 2019, I spotted one entry by Cornish et al. on scalable Metropolis-Hastings, which Arnaud Doucet had mentioned to me yesterday when in Oxford. The paper builds on the delayed acceptance paper we wrote with Marco Banterlé, Clara Grazian and Anthony Lee, itself relying on a factorisation decomposition of the likelihood, combined with control variate accelerating techniques. The factorisation of both the target and the proposal allows for a (less efficient) Metropolis-Hastings acceptance ratio that is the product

$\prod_{i=1}^m \alpha_i(\theta,\theta')$

of individual Metropolis-Hastings acceptance ratios, but which allows for quicker rejection if one of the probabilities in the product is small, because the corresponding Bernoulli draw is zero with high probability. One advance made in Michel et al. (2017) [which I doubly missed] is that subsampling is achievable by thinning (as in PDMPs, where these authors have been quite active) through an algorithm of Shantikumar (1985) [described in Devroye’s bible]. Provided each Metropolis-Hastings probability can be lower bounded:

$\alpha_i(\theta,\theta') \ge \exp\{-\psi_i \phi(\theta,\theta')\}$

by a term where the transition φ does not depend on the index i in the product. The computing cost of the thinning process thus depends on the efficiency of the subsampling, namely whether or not the (Poisson) number of terms is much smaller than m, number of terms in the product. A neat trick in the current paper that extends the the Fukui-Todo procedure is to switch to the original Metropolis-Hastings when the overall lower bound is too small, recovering the geometric ergodicity of this original if it holds (Theorem 2.1). Another neat remark is that when using the naïve factorisation as the product of the n individual likelihoods, the resulting algorithm is sort of doomed as n grows, even with an optimal scaling of the proposals. To achieve scalability, the authors introduce a Taylor (i.e., Gaussian) approximation to each local target in the product and start the acceptance decomposition by using the resulting overall Gaussian approximation. Meaning that the remaining product is now made of ratios of targets over their local Taylor approximations, hence most likely close to one. And potentially lower-bounded by the remainder term in the Taylor expansion. Leading to the conclusion that, when everything goes well, meaning that the Taylor expansions can be conducted and the bounds derived for the appropriate expansion, the order of the Poisson scale is O(1/√n)..! The proposal for the Metropolis-Hastings move is actually tuned to the Gaussian approximation, appearing as a variant of the Langevin move or more exactly a discretization of an Hamiltonian move. Obviously, I cannot judge of the complexity in implementing this new scheme from just reading the paper, but this development on the split target is definitely an exciting prospect for handling huge datasets and their friends!

Fisher’s lost information

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on February 11, 2019 by xi'an

After a post on X validated and a good discussion at work, I came to the conclusion [after many years of sweeping the puzzle under the carpet] that the (a?) Fisher information obtained for the Uniform distribution U(0,θ) as θ⁻¹ is meaningless. Indeed, there are many arguments:

1. The lack of derivability of the indicator function for x=θ is a non-issue since the derivative is defined almost everywhere.
2. In many textbooks, the Fisher information θ⁻² is derived from the Fréchet-Darmois-Cramèr-Rao inequality, which does not apply for the Uniform U(0,θ) distribution.
3. One connected argument for the expression of the Fisher information as the expectation of the squared score is that it is the variance of the score, since its expectation is zero. Except that it is not zero for the Uniform U(0,θ) distribution.
4. For the same reason, the opposite of the second derivative of the log-likelihood is not equal to the expectation of the squared score. It is actually -θ⁻²!
5. Looking at the Taylor expansion justification of the (observed) Fisher information, expanding the log-likelihood around the maximum likelihood estimator does not work since the maximum likelihood estimator does not cancel the score.
6. When computing the Fisher information for an n-sample rather than a 1-sample, the information is n²θ⁻², rather than nθ⁻².
7. Since the speed of convergence of the maximum likelihood estimator is of order n⁻², the central limit theorem does not apply and the limiting variance of the maximum likelihood estimator is not the Fisher information.