## integral theorems for Monte Carlo

Posted in Books, pictures, Statistics with tags , , , , , , , on August 12, 2021 by xi'an

Nhat Ho and Stephen G. Walker have just arXived a paper on the use of (Fourier) integral theorems for Monte Carlo estimators, following the earlier entry of Parzen: namely that for any integrable function,

$m(y)=\frac{1}{(2\pi)^d}\int_{\mathbb R^d}\int_{\mathbb R^d}\cos(s^\text{T}(y-x))m(x)\text dx\text ds$

which can be turned into an estimator of a density m based on a sample from m. This identity can be rewritten as

$m(y)=\lim_{R\to\infty}\frac{1}{\pi^d}\int_{\mathbb R^d}\prod_{i=1}^d\dfrac{\sin(R(y_i-x_i))}{y_i-x_i}\;m(x)\,\text dx$

and the paper generalises this identity to all cyclic functions. Even though it establishes that sin is the optimal choice. After reading this neat result, I however remain uncertain on how this could help with Monte Carlo integration.

## what makes variables randoms [book review]

Posted in Books, Mountains, Statistics with tags , , , , , , on July 19, 2017 by xi'an

When the goal of a book is to make measure theoretic probability available to applied researchers for conducting their research, I cannot but applaud! Peter Veazie’s goal of writing “a brief text that provides a basic conceptual introduction to measure theory” (p.4) is hence most commendable. Before reading What makes variables random, I was uncertain how this could be achieved with a limited calculus background, given the difficulties met by our third year maths students. After reading the book, I am even less certain this is feasible!

“…it is the data generating process that makes the variables random and not the data.”

Chapter 2 is about basic notions of set theory. Chapter 3 defines measurable sets and measurable functions and integrals against a given measure μ as

$\sup_\pi \sum_{A\in\pi}\inf_{\omega\in A} f(\omega)\mu(A)$

which I find particularly unnatural compared with the definition through simple functions (esp. because it does not tell how to handle 0x∞). The ensuing discussion shows the limitation of the exercise in that the definition is only explained for finite sets (since the notion of a partition achieving the supremum on page 29 is otherwise meaningless). A generic problem with the book, in that most examples in the probability section relate to discrete settings (see the discussion of the power set p.66). I also did not see a justification as to why measurable functions enjoy well-defined integrals in the above sense. All in all, to see less than ten pages allocated to measure theory per se is rather staggering! For instance,

$\int_A f\text{d}\mu$

does not appear to be defined at all.

“…the mathematical probability theory underlying our analyses is just mathematics…”

Chapter 4 moves to probability measures. It distinguishes between objective (or frequentist) and subjective measures, which is of course open to diverse interpretations. And the definition of a conditional measure is the traditional one, conditional on a set rather than on a σ-algebra. Surprisingly as this is in my opinion one major reason for using measures in probability theory. And avoids unpleasant issues such as Bertrand’s paradox. While random variables are defined in the standard sense of real valued measurable functions, I did not see a definition of a continuous random variables or of the Lebesgue measure. And there are only a few lines (p.48) about the notion of expectation, which is so central to measure-theoretic probability as to provide a way of entry into measure theory! Progressing further, the σ-algebra induced by a random variable is defined as a partition (p.52), a particularly obscure notion for continuous rv’s. When the conditional density of one random variable given the realisation of another is finally introduced (p.63), as an expectation reconciling with the set-wise definition of conditional probabilities, it is in a fairly convoluted way that I fear will scare newcomers out of their wit. Since it relies on a sequence of nested sets with positive measure, implying an underlying topology and the like, which somewhat shows the impossibility of the overall task…

“In the Bayesian analysis, the likelihood provides meaning to the posterior.”

Statistics is hurriedly introduced in a short section at the end of Chapter 4, assuming the notion of likelihood is already known by the readers. But nitpicking (p.65) at the representation of the terms in the log-likelihood as depending on an unspecified parameter value θ [not to be confused with the data-generating value of θ, which does not appear clearly in this section]. Section that manages to include arcane remarks distinguishing maximum likelihood estimation from Bayesian analysis, all this within a page! (Nowhere is the Bayesian perspective clearly defined.)

“We should no more perform an analysis clustered by state than we would cluster by age, income, or other random variable.”

The last part of the book is about probabilistic models, drawing a distinction between data generating process models and data models (p.89), by which the author means the hypothesised probabilistic model versus the empirical or bootstrap distribution. An interesting way to relate to the main thread, except that the convergence of the data distribution to the data generating process model cannot be established at this level. And hence that the very nature of bootstrap may be lost on the reader. A second and final chapter covers some common or vexing problems and the author’s approach to them. Revolving around standard errors, fixed and random effects. The distinction between standard deviation (“a mathematical property of a probability distribution”) and standard error (“representation of variation due to a data generating process”) that is followed for several pages seems to boil down to a possible (and likely) model mis-specification. The chapter also contains an extensive discussion of notations, like indexes (or indicators), which seems a strange focus esp. at this location in the book. Over 15 pages! (Furthermore, I find quite confusing that a set of indices is denoted there by the double barred I, usually employed for the indicator function.)

“…the reader will probably observe the conspicuous absence of a time-honoured topic in calculus courses, the “Riemann integral”… Only the stubborn conservatism of academic tradition could freeze it into a regular part of the curriculum, long after it had outlived its historical importance.” Jean Dieudonné, Foundations of Modern Analysis

In conclusion, I do not see the point of this book, from its insistence on measure theory that never concretises for lack of mathematical material to an absence of convincing examples as to why this is useful for the applied researcher, to the intended audience which is expected to already quite a lot about probability and statistics, to a final meandering around linear models that seems at odds with the remainder of What makes variables random, without providing an answer to this question. Or to the more relevant one of why Lebesgue integration is preferable to Riemann integration. (Not that there does not exist convincing replies to this question!)

## complexity of the von Neumann algorithm

Posted in Statistics with tags , , , , , , , , , on April 3, 2017 by xi'an

“Without the possibility of computing infimum and supremum of the density f over compact subintervals of the domain of f, sampling absolutely continuous distribution using the rejection method seems to be impossible in total generality.”

The von Neumann algorithm is another name for the rejection method introduced by von Neumann circa 1951. It was thus most exciting to spot a paper by Luc Devroye and Claude Gravel appearing in the latest Statistics and Computing. Assessing the method in terms of random bits and precision. Specifically, assuming that the only available random generator is one of random bits, which necessarily leads to an approximation when the target is a continuous density. The authors first propose a bisection algorithm for distributions defined on a compact interval, which compares random bits with recursive bisections of the unit interval and stops when the interval is small enough.

In higher dimension, for densities f over the unit hypercube, they recall that the original algorithm consisted in simulating uniforms x and u over the hypercube and [0,1], using the uniform as the proposal distribution and comparing the density at x, f(x), with the rescaled uniform. When using only random bits, the proposed method is based on a quadtree that subdivides the unit hypercube into smaller and smaller hypercubes until the selected hypercube is entirely above or below the density. And is small enough for the desired precision. This obviously requires for the computation of the upper and lower bound of the density over the hypercubes to be feasible, with Devroye and Gravel considering that this is a necessary property as shown by the above quote. Densities with non-compact support can be re-expressed as densities on the unit hypercube thanks to the cdf transform. (Actually, this is equivalent to the general accept-reject algorithm, based on the associated proposal.)

“With the oracles introduced in our modification of von Neumann’s method, we believe that it is impossible to design a rejection algorithm for densities that are not Riemann-integrable, so the question of the design of a universally valid rejection algorithm under the random bit model remains open.”

In conclusion, I enjoyed very much reading this paper, especially the reflection it proposes on the connection between Riemann integrability and rejection algorithms. (Actually, I cannot think straight away of a simulation algorithm that would handle non-Riemann-integrable densities, apart from nested sampling. Or of significant non-Riemann-integrable densities.)

## a concise introduction to statistical inference [book review]

Posted in Statistics with tags , , , , , , , , , , on February 16, 2017 by xi'an

[Just to warn readers and avoid emails about Xi’an plagiarising Christian!, this book was sent to me by CRC Press for a review. To be published in CHANCE.]

This is an introduction to statistical inference. And with 180 pages, it indeed is concise! I could actually stop the review at this point as a concise review of a concise introduction to statistical inference, as I do not find much originality in this introduction, intended for “mathematically sophisticated first-time student of statistics”. Although sophistication is in the eye of the sophist, of course, as this book has margin symbols in the guise of integrals to warn of section using “differential or integral calculus” and a remark that the book is still accessible without calculus… (Integral calculus as in Riemann integrals, not Lebesgue integrals, mind you!) It even includes appendices with the Greek alphabet, summation notations, and exponential/logarithms.

“In statistics we often bypass the probability model altogether and simply specify the random variable directly. In fact, there is a result (that we won’t cover in detail) that tells us that, for any random variable, we can find an appropriate probability model.” (p.17)

Given its limited mathematical requirements, the book does not get very far in the probabilistic background of statistics methods, which makes the corresponding chapter not particularly helpful as opposed to a prerequisite on probability basics. Since not much can be proven without “all that complicated stuff about for any ε>0” (p.29). And makes defining correctly notions like the Central Limit Theorem impossible. For instance, Chebychev’s inequality comes within a list of admitted results. There is no major mistake in the chapter, even though mentioning that two correlated Normal variables are jointly Normal (p.27) is inexact.

“The power of a test is the probability that you do not reject a null that is in fact correct.” (p.120)

Most of the book follows the same pattern as other textbooks at that level, covering inference on a mean and a probability, confidence intervals, hypothesis testing, p-values, and linear regression. With some words of caution about the interpretation of p-values. (And the unfortunate inversion of the interpretation of power above.) Even mentioning the Cult [of Significance] I reviewed a while ago.

Given all that, the final chapter comes as a surprise, being about Bayesian inference! Which should make me rejoice, obviously, but I remain skeptical of introducing the concept to readers with so little mathematical background. And hence a very shaky understanding of a notion like conditional distributions. (Which reminds me of repeated occurrences on X validated when newcomers hope to bypass textbooks and courses to grasp the meaning of posteriors and such. Like when asking why Bayes Theorem does not apply for expectations.) I can feel the enthusiasm of the author for this perspective and it may diffuse to some readers, but apart from being aware of the approach, I wonder how much they carry away from this brief (decent) exposure. The chapter borrows from Lee (2012, 4th edition) and from Berger (1985) for the decision-theoretic part. The limitations of the exercise are shown for hypothesis testing (or comparison) by the need to restrict the parameter space to two possible values. And for decision making. Similarly, introducing improper priors and the likelihood principle [distinguished there from the law of likelihood] is likely to get over the head of most readers and clashes with the level of the previous chapters. (And I do not think this is the most efficient way to argue in favour of a Bayesian approach to the problem of statistical inference: I have now dropped all references to the likelihood principle from my lectures. Not because of the controversy, but simply because the students do not get it.) By the end of the chapter, it is unclear a neophyte would be able to spell out how one could specify a prior for one of the problems processed in the earlier chapters. The appendix on de Finetti’s formalism on personal probabilities is very much unlikely to help in this regard. While it sounds so far beyond the level of the remainder of the book.

## an email exchange about integral representations

Posted in Books, R, Statistics, University life with tags , , , , on April 8, 2015 by xi'an

I had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral

$\mathbb{P}(Y\in \mathcal{A},U\le f(Y)/Mg(Y)) = \int_\mathcal{A} \int_0^{f(y)/Mg(y)}\,\text{d}u\,g(y)\,\text{d}y\,,$

in that it took me several iterations [as shown in the above] to realise the issue was with the notation

$\int_0^a \,\text{d}u\,,$

which seemed to be missing a density term or, in other words, be different from

$\int_0^1 \,\mathbb{I}_{(0,a)}(u)\,\text{d}u\,,$

What is surprising for me is that the integral

$\int_0^a \,\text{d}u$

has a clear meaning as a Riemann integral, hence should be more intuitive….