## ABC for repulsive point processes

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 5, 2016 by xi'an

Shinichiro Shirota and Alan Gelfand arXived a paper on the use of ABC for analysing some repulsive point processes, more exactly the Gibbs point processes, for which ABC requires a perfect sampler to operate, unless one is okay with stopping an MCMC chain before it converges, and determinantal point processes studied by Lavancier et al. (2015) [a paper I wanted to review and could not find time to!]. Detrimental point processes have an intensity function that is the determinant of a covariance kernel, hence repulsive. Simulation of a determinantal process itself is not straightforward and involves approximations. But the likelihood itself is unavailable and Lavancier et al. (2015) use approximate versions by fast Fourier transforms, which means MCMC is challenging even with those approximate steps.

“The main computational cost of our algorithm is simulation of x for each iteration of the ABC-MCMC.”

The authors propose here to use ABC instead. With an extra approximative step for simulating the determinantal process itself. Interestingly, the Gibbs point process allows for a sufficient statistic, the number of R-closed points, although I fail to see how the radius R is determined by the model, while the determinantal process does not. The summary statistics end up being a collection of frequencies within various spheres of different radii. However, these statistics are then processed by Fearnhead’s and Prangle’s proposal, namely to come up as an approximation of E[θ|y] as the natural summary. Obtained by regression over the original summaries. Another layer of complexity stems from using an ABC-MCMC approach. And including a Lasso step in the regression towards excluding less relevant radii. The paper also considers Bayesian model validation for such point processes, implementing prior predictive tests with a ranked probability score, rather than a Bayes factor.

As point processes have always been somewhat mysterious to me, I do not have any intuition about the strength of the distributional assumptions there and the relevance of picking a determinantal process against, say, a Strauss process. The model comparisons operated in the paper are not strongly supporting one repulsive model versus the others, with the authors concluding at the need for many points towards a discrimination between models. I also wonder at the possibility of including other summaries than Ripley’s K-functions, which somewhat imply a discretisation of the space, by concentric rings. Maybe using other point processes for deriving summary statistics as MLEs or Bayes estimators for those models would help. (Or maybe not.)

## CRiSM workshop on estimating constants [slides]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on May 4, 2016 by xi'an

A short announcement that the slides of almost all talks at the CRiSM workshop on estimating constants last April 20-22 are now available. Enjoy (and dicuss)!

## global-local mixtures

Posted in Books, pictures, Running, Statistics, Travel with tags , , on May 4, 2016 by xi'an

Anindya Bhadra, Jyotishka Datta, Nick Polson and Brandon Willard have arXived this morning a short paper on global-local mixtures. Although the definition given in the paper (p.1) is rather unclear, those mixtures are distributions of a sample that are marginals over component-wise (local) and common (global) parameters. The observations of the sample are (marginally) exchangeable if not independent.

“The Cauchy-Schlömilch transformation not only guarantees an ‘astonishingly simple’ normalizing constant for f(·), it also establishes the wide class of unimodal densities as global-local scale mixtures.”

The paper relies on the Cauchy-Schlömilch identity

$\int_0^\infty f(\{x-g(x)\}^2)\text{d}x=\int_0^\infty f(y^2)\text{d}y\qquad \text{with}\quad g(x)=g^{-1}(x)$

a self-inverse function. This generic result proves helpful in deriving demarginalisations of a Gaussian distribution for densities outside the exponential family like Laplace’s. (This is getting very local for me as Cauchy‘s house is up the hill, while Laplace lived two train stations away. Before train was invented, of course.) And for logistic regression. The paper also briefly mentions Etienne Halphen for his introduction of generalised inverse Gaussian distributions, Halphen who was one of the rare French Bayesians, worked for the State Electricity Company (EDF) and briefly with Lucien Le Cam (before the latter left for the USA). Halphen introduced some families of distributions during the early 1940’s, including the generalised inverse Gaussian family, which were first presented by his friend Daniel Dugué to the Académie des Sciences maybe because of the Vichy racial laws… A second result of interest in the paper is that, given a density g and a transform s on positive real numbers that is decreasing and self-inverse, the function f(x)=2g(x-s(x)) is again a density, which can again be represented as a global-local mixture. [I wonder if these representations could be useful in studying the Cauchy conjecture solved last year by Natesh and Xiao-Li.]

## visite à l’Agro

Posted in pictures, Statistics, University life on May 3, 2016 by xi'an

## contemporary issues in hypothesis testing

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on May 3, 2016 by xi'an

Next Fall, on 15-16 September, I will take part in a CRiSM workshop on hypothesis testing. In our department in Warwick. The registration is now open [until Sept 2] with a moderate registration free of £40 and a call for posters. Jim Berger and Joris Mulder will both deliver a plenary talk there, while Andrew Gelman will alas give a remote talk from New York. (A terrific poster by the way!)

## auxiliary likelihood-based approximate Bayesian computation in state-space models

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on May 2, 2016 by xi'an

With Gael Martin, Brendan McCabe, David T. Frazier, and Worapree Maneesoonthorn, we arXived (and submitted) a strongly revised version of our earlier paper. We begin by demonstrating that reduction to a set of sufficient statistics of reduced dimension relative to the sample size is infeasible for most state-space models, hence calling for the use of partial posteriors in such settings. Then we give conditions [like parameter identification] under which ABC methods are Bayesian consistent, when using an auxiliary model to produce summaries, either as MLEs or [more efficiently] scores. Indeed, for the order of accuracy required by the ABC perspective, scores are equivalent to MLEs but are computed much faster than MLEs. Those conditions happen to to be weaker than those found in the recent papers of Li and Fearnhead (2016) and Creel et al.  (2015).  In particular as we make no assumption about the limiting distributions of the summary statistics. We also tackle the dimensionality curse that plagues ABC techniques by numerically exhibiting the improved accuracy brought by looking at marginal rather than joint modes. That is, by matching individual parameters via the corresponding scalar score of the integrated auxiliary likelihood rather than matching on the multi-dimensional score statistics. The approach is illustrated on realistically complex models, namely a (latent) Ornstein-Ulenbeck process with a discrete time linear Gaussian approximation is adopted and a Kalman filter auxiliary likelihood. And a square root volatility process with an auxiliary likelihood associated with a Euler discretisation and the augmented unscented Kalman filter.  In our experiments, we compared our auxiliary based  technique to the two-step approach of Fearnhead and Prangle (in the Read Paper of 2012), exhibiting improvement for the examples analysed therein. Somewhat predictably, an important challenge in this approach that is common with the related techniques of indirect inference and efficient methods of moments, is the choice of a computationally efficient and accurate auxiliary model. But most of the current ABC literature discusses the role and choice of the summary statistics, which amounts to the same challenge, while missing the regularity provided by score functions of our auxiliary models.

## gap frequencies [& e]

Posted in Kids, R with tags , , , on April 29, 2016 by xi'an

A riddle from The Riddler where brute-force simulation does not pay:

For a given integer N, pick at random without replacement integers between 1 and N by prohibiting consecutive integers until all possible entries are exhausted. What is the frequency of selected integers as N grows to infinity?

A simple implementation of the random experiment is as follows

generope=function(N){
frei=rep(1,N)
hus=0
while (max(frei)==1){
i=sample(rep((1:N)[frei==1],2),1)
frei[i]=frei[i+1]=frei[i-1]=0
hus=hus+1}
return(hus)}


It is however quite slow and does not exploit the recursive feature of the sampling, namely that the first draw breaks the set {1,…,N} into two sets:

generipe=function(N){
if (N<2){ return((N>0))}else{
i=sample(1:N,1)
return(1+generipe(i-2)+generipe(N-i-1))}}


But even this faster solution takes a while to run for large values of N:

frqns=function(N){
space=0
for (t in 1:1e3) space=space+generipe(N)
return(space/(N*1e3))}


as for instance

>  microbenchmark(frqns(100),time=10)
Unit: nanoseconds
expr       min        lq         mean    median        uq       max
frqns(100) 178720117 185020903 212212355.77 188710872 205865816 471395620
time         4         8        26.13        32        37       102


Hence this limits the realisation of simulation to, say, N=10⁴. Thinking further about the recursive aspect of the process however leads to a recursion on the frequencies qN, as it is straightforward to prove that

$q_N=\frac{1}{N}+\frac{2}{N^2}\,\sum_{i=1}^{N-2} iq_i$

with q1=1 and q2=1/2. This recursion can be further simplified into

$q_N=\frac{1}{N^2}+\frac{2(N-2)}{N^2}\,q_{N-2}+\frac{(N-1)^2}{N^2}q_{N-1}\qquad(1)$

which allows for a much faster computation

s=seq(1,1e7) #s[n]=n*q[n]
for (n in 3:1e7) s[n]=(1+2*s[n-2]+(n-1)*s[n-1])/n


and a limiting value of 0.4323324… Since storing s does not matter, a sliding update works even better:

a=b=1
for (n in 3:1e8){ c=(1+2*a+(n-1)*b)/n;a=b;b=c}


still producing a final value of 0.4323324, which may mean we have reached some limit in the precision.

As I could not figure out a way to find the limit of the sequence (1) above, I put it on the maths forum of Stack Exchange and very quickly got the answer (obtained by a power series representation) that the limit is (rather amazingly!)

$\dfrac{1 - e^{-2}}{2}$

which is 0.432332358.., hence very close to the numerical value obtained for n=3×10⁸. (Which does not change from n=10⁸, once again for precision reasons.) Now I wonder whether or not an easier derivation of the above is feasible, but we should learn about it in a few hours on The Riddler. [Update: The solution published by The Riddler is exactly that one, using a power series expansion to figure out the limit of the series, unfortunately. I was hoping for a de Montmort trick or sort of…]