## Bayesian intelligence in Warwick

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , on February 18, 2019 by xi'an

This is an announcement for an exciting CRiSM Day in Warwick on 20 March 2019: with speakers

10:00-11:00 Xiao-Li Meng (Harvard): “Artificial Bayesian Monte Carlo Integration: A Practical Resolution to the Bayesian (Normalizing Constant) Paradox”

11:00-12:00 Julien Stoehr (Dauphine): “Gibbs sampling and ABC”

14:00-15:00 Arthur Ulysse Jacot-Guillarmod (École Polytechnique Fedérale de Lausanne): “Neural Tangent Kernel: Convergence and Generalization of Deep Neural Networks”

15:00-16:00 Antonietta Mira (Università della Svizzera italiana e Università degli studi dell’Insubria): “Bayesian identifications of the data intrinsic dimensions”

[whose abstracts are on the workshop webpage] and free attendance. The title for the workshop mentions Bayesian Intelligence: this obviously includes human intelligence and not just AI!

## undecidable learnability

Posted in Books, Statistics, Travel, University life with tags , , , , , , on February 15, 2019 by xi'an

“There is an unknown probability distribution P over some finite subset of the interval [0,1]. We get to see m i.i.d. samples from P for m of our choice. We then need to find a finite subset of [0,1] whose P-measure is at least 2/3. The theorem says that the standard axioms of mathematics cannot be used to prove that we can solve this problem, nor can they be used to prove that we cannot solve this problem.”

In the first issue of the (controversial) nature machine intelligence journal, Ben-David et al. wrote a paper they present a s the machine learning equivalent to Gödel’s incompleteness theorem. The result is somewhat surprising from my layman perspective and it seems to only relate to a formal representation of statistical problems. Formal as in the Vapnik-Chervonenkis (PAC) theory. It sounds like, given a finite learning dataset, there are always features that cannot be learned if the size of the population grows to infinity, but this is hardly exciting…

The above quote actually makes me think of the Robbins-Wasserman counter-example for censored data and Bayesian tail prediction, but I am unsure the connection is anything more than sheer fantasy..!
~

## O’Bayes 19: registration and travel support

Posted in pictures, Running, Travel, University life with tags , , , , , , , on February 14, 2019 by xi'an

An update about the O’Bayes 19 conference next June-July:  the early registration period has now opened. And there should be funds for supporting early-career researchers, thanks to Google and CNRS sponsorships, as detailed below:

Early-career researchers less than four years from PhD, are invited to apply for early-career scholarships. If you are a graduate student, postdoctoral researcher or early-career academic and you are giving a poster, you are eligible to apply. Female researchers and underrepresented minorities are especially encouraged to apply. Selected applicants will receive up to £450, which can be used for any combination of fees, travel and accommodation costs, subject to receipts.

The deadline for applying is the 15th of March (which is also the deadline to submit the abstract for the poster) and it has to be done at the registration phase via the dedicated page. Those who have submitted an abstract before this information on scholarships was made available (11 Feb.) and applying for travel support should contact the organisers.

## a pen for ABC

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , on February 13, 2019 by xi'an

Among the flury of papers arXived around the ICML 2019 deadline, I read on my way back from Oxford a paper by Wiqvist et al. on learning summary statistics for ABC by neural nets. Pointing out at another recent paper by Jiang et al. (2017, Statistica Sinica) which constructed a neural network for predicting each component of the parameter vector based on the input (raw) data, as an automated non-parametric regression of sorts. Creel (2017) does the same but with summary statistics. The current paper builds up from Jiang et al. (2017), by adding the constraint that exchangeability and partial exchangeability features should be reflected by the neural net prediction function. With applications to Markovian models. Due to a factorisation theorem for d-block invariant models, the authors impose partial exchangeability for order d Markov models by combining two neural networks that end up satisfying this factorisation. The concept is exemplified for one-dimension g-and-k distributions, alpha-stable distributions, both of which are made of independent observations, and the AR(2) and MA(2) models, as in our 2012 ABC survey paper. Since the later is not Markovian the authors experiment with different orders and reach the conclusion that an order of 10 is most appropriate, although this may be impacted by being a ble to handle the true likelihood.

## scalable Metropolis-Hastings

Posted in Books, Statistics, Travel with tags , , , , , , , , , on February 12, 2019 by xi'an

Among the flury of arXived papers of last week (414!), including a fair chunk of papers submitted to ICML 2019, I spotted one entry by Cornish et al. on scalable Metropolis-Hastings, which Arnaud Doucet had mentioned to me yesterday when in Oxford. The paper builds on the delayed acceptance paper we wrote with Marco Banterlé, Clara Grazian and Anthony Lee, itself relying on a factorisation decomposition of the likelihood, combined with control variate accelerating techniques. The factorisation of both the target and the proposal allows for a (less efficient) Metropolis-Hastings acceptance ratio that is the product

$\prod_{i=1}^m \alpha_i(\theta,\theta')$

of individual Metropolis-Hastings acceptance ratios, but which allows for quicker rejection if one of the probabilities in the product is small, because the corresponding Bernoulli draw is zero with high probability. One advance made in Michel et al. (2017) [which I doubly missed] is that subsampling is achievable by thinning (as in PDMPs, where these authors have been quite active) through an algorithm of Shantikumar (1985) [described in Devroye’s bible]. Provided each Metropolis-Hastings probability can be lower bounded:

$\alpha_i(\theta,\theta') \ge \exp\{-\psi_i \phi(\theta,\theta')\}$

by a term where the transition φ does not depend on the index i in the product. The computing cost of the thinning process thus depends on the efficiency of the subsampling, namely whether or not the (Poisson) number of terms is much smaller than m, number of terms in the product. A neat trick in the current paper that extends the the Fukui-Todo procedure is to switch to the original Metropolis-Hastings when the overall lower bound is too small, recovering the geometric ergodicity of this original if it holds (Theorem 2.1). Another neat remark is that when using the naïve factorisation as the product of the n individual likelihoods, the resulting algorithm is sort of doomed as n grows, even with an optimal scaling of the proposals. To achieve scalability, the authors introduce a Taylor (i.e., Gaussian) approximation to each local target in the product and start the acceptance decomposition by using the resulting overall Gaussian approximation. Meaning that the remaining product is now made of ratios of targets over their local Taylor approximations, hence most likely close to one. And potentially lower-bounded by the remainder term in the Taylor expansion. Leading to the conclusion that, when everything goes well, meaning that the Taylor expansions can be conducted and the bounds derived for the appropriate expansion, the order of the Poisson scale is O(1/√n)..! The proposal for the Metropolis-Hastings move is actually tuned to the Gaussian approximation, appearing as a variant of the Langevin move or more exactly a discretization of an Hamiltonian move. Obviously, I cannot judge of the complexity in implementing this new scheme from just reading the paper, but this development on the split target is definitely an exciting prospect for handling huge datasets and their friends!

## Fisher’s lost information

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on February 11, 2019 by xi'an

After a post on X validated and a good discussion at work, I came to the conclusion [after many years of sweeping the puzzle under the carpet] that the (a?) Fisher information obtained for the Uniform distribution U(0,θ) as θ⁻¹ is meaningless. Indeed, there are many arguments:

1. The lack of derivability of the indicator function for x=θ is a non-issue since the derivative is defined almost everywhere.
2. In many textbooks, the Fisher information θ⁻² is derived from the Fréchet-Darmois-Cramèr-Rao inequality, which does not apply for the Uniform U(0,θ) distribution.
3. One connected argument for the expression of the Fisher information as the expectation of the squared score is that it is the variance of the score, since its expectation is zero. Except that it is not zero for the Uniform U(0,θ) distribution.
4. For the same reason, the opposite of the second derivative of the log-likelihood is not equal to the expectation of the squared score. It is actually -θ⁻²!
5. Looking at the Taylor expansion justification of the (observed) Fisher information, expanding the log-likelihood around the maximum likelihood estimator does not work since the maximum likelihood estimator does not cancel the score.
6. When computing the Fisher information for an n-sample rather than a 1-sample, the information is n²θ⁻², rather than nθ⁻².
7. Since the speed of convergence of the maximum likelihood estimator is of order n⁻², the central limit theorem does not apply and the limiting variance of the maximum likelihood estimator is not the Fisher information.

## Fate & Fortune [book review]

Posted in Books, Travel with tags , , , , , , , on February 10, 2019 by xi'an

After enjoying very much the first book, Hue & Cry, in the Hew Cullan series by Shirley McKay, I bought the following ones and read Fate & Fortune over the vacation break. If anything, I enjoyed this one even more, as it disclosed other aspects of 16th Century Scotland, still with the oppressive domination of the Kirk, the highly puritan Church of Scotland, over all aspects of everyday life, but also a more rational form of Law, plus the first instances of caitch, imported from France jeu de paume. And the medical approach of the time against an epidemics of syphilis. And the dangerous life of printers at the time, always in danger of arrest and worse. As usual with historical whodunits, it is hard to guess what is genuinely from 1580’s and what has been imported from the present era, but this is a most pleasant (light and short) book to read!