Archive for SAMSI

more air [&q’s] for MCMC [comments]

Posted in Books, pictures, Statistics with tags , , , , , , , , , on June 11, 2021 by xi'an

[A rich set of comments by Tom Loredo about convergence assessments for MCMC that I feel needs reposting:]

Two quick points:

  • By coincidence (and for a different problem), I’ve just been looking at the work of Gorham & Mackey that I believe Pierre is referring to. This is probably the relevant paper: “Measuring Sample Quality with Kernels“.
  • Besides their new rank-based R-hat, bloggers on Gelman’s blog have also pointed to another R-hat replacement, R, developed by some Stan team members; it is “based on how well a machine learning classifier model can successfully discriminate the individual chains.” See: “R: A robust MCMC convergence diagnostic with uncertainty using decision tree classifiers”.

In addition, here’s an anecdote regarding your comment, “I remain perplexed and frustrated by the fact that, 30 years later, the computed values of the visited likelihoods are not better exploited.”

That has long bothered me, too. During a SAMSI program around 2006, I spent time working on one approach that tried to use the prior*likelihood (I call it q(θ), for “quasiposterior” and because it’s next to “p”!) to compute the marginal likelihood. It would take posterior samples (from MCMC or another approach) and find their Delaunay triangulation. Then, using q(θ) on the nodes of the simplices comprising the triangulation, it used a simplicial cubature rule to approximate the integral of q(theta) over the volume spanned by the samples.

As I recall, I only explored it with multivariate normal and Student-t targets. It failed, but in an interesting way. It worked well in low dimensions, but gave increasingly poor estimates as dimension grew. The problem appeared related to concentration of measure (or the location of the typical set), with the points not sufficiently covering the center or the large volume in the tails (or both; I can’t remember what diagnostics said exactly).

Another problem is that Delaunay triangulation gets expensive quickly with growing dimension. This method doesn’t need an optimal triangulation, so I wondered if there was a faster sub-optimal triangulation algorithm, but I couldn’t find one.

An interesting aspect of this approach is that the fact that the points are drawn from the prior doesn’t matter. Any set of points is a valid set of points for approximating the integral (in the spanned volume). I just used posterior samples because I presumed those would be available from MCMC. I briefly did some experiments taking the samples, and reweighting them to draw a subset for the cubature that was either over- or under-dispersed vs. the target. And one could improve things this way (I can’t remember what choice was better). This suggests that points drawn from q(theta) aren’t optimal for such cubature, but I never tried looking formally for the optimal choice.

I called the approach “adaptive simplicial cubature,” adaptive in the sense that the points are chosen in a way that depends on the integrand.

The only related work I could find at the time was work by you and Anne Philippe on Riemanns sums with MCMC (https://doi.org/10.1023/A:1008926514119). I later stumbled upon a paper on “random Riemann sum estimators” as an alternative to Monte Carlo that seems related but that I didn’t explore further (https://doi.org/10.1016/j.csda.2006.09.041).

I still find it hard to believe that the q values aren’t useful. Admittedly, in an n-dimensional distribution, it’s just 1 more quantity available beyond the n that comprise the sample location. But it’s a qualitatively different type of information from the sample location, and I can’t help but think there’s some clever way to use it (besides emulating the response surface).

sudoku break

Posted in pictures, R, Statistics with tags , , , , on December 13, 2013 by xi'an

sudo291113While in Warwick last week, one evening after having exhausted my laptop battery, I tried the following Sudoku (from Libération):

>   printSudoku(readSudoku("libe.dk"))
  +-------+-------+-------+
  | 4   6 |   2   | 3   9 |
  |   3   |       |   2   |
  | 7   2 |       | 5   6 |
  +-------+-------+-------+
  |       | 9 4 5 |       |
  | 5     | 7 6 2 |     1 |
  |       | 3 1 8 |       |
  +-------+-------+-------+
  | 6   9 |       | 1   3 |
  |   7   |       |   9   |
  | 3   1 |   9   | 4   7 |
  +-------+-------+-------+

and could not even start. As it happened, this was a setting with no deterministic move, i.e. all free/empty entries had multiple possible values. So after trying for a while and following trees to no obvious contradiction (!) I decided to give up and on the next day (with power) to call my “old” sudoku solver (built while at SAMSI), using simulated annealing and got the result after a few thousand iterations. The detail of the exploration is represented above, the two colours being code for two different moves on the Sudoku table. Leading to the solution

  +-------+-------+-------+
  | 4 8 6 | 5 2 1 | 3 7 9 |
  | 1 3 5 | 6 7 9 | 8 2 4 |
  | 7 9 2 | 8 3 4 | 5 1 6 |
  +-------+-------+-------+
  | 2 1 3 | 9 4 5 | 7 6 8 |
  | 5 4 8 | 7 6 2 | 9 3 1 |
  | 9 6 7 | 3 1 8 | 2 4 5 |
  +-------+-------+-------+
  | 6 2 9 | 4 8 7 | 1 5 3 |
  | 8 7 4 | 1 5 3 | 6 9 2 |
  | 3 5 1 | 2 9 6 | 4 8 7 |
  +-------+-------+-------+

I then tried a variant with more proposals (hence more colours) at each iteration, which ended up being stuck at a penalty of 4 (instead of 0) in the final thousand iterations. Although this is a one occurrence experiment, I find it interesting that having move proposals may get the algorithm stuck faster in a local minimum. Nothing very deep there, of course..!

sudo301113

SAMSI workshop

Posted in Statistics, Travel with tags , , , , on March 22, 2010 by xi'an

Taking advantage of the people gathered at Frontiers of Statistical Decision Making and Bayesian Analysis, Dongchu Sun organised a one-day SAMSI workshop on reference priors for spatio-temporal models. Talking with a small group focused on this  topic was quite enjoyable and a change from the larger crowds at the conference (even though talks were also enjoyable there!). I particularly appreciated the discussion we had around AR(p) models and the difficulty of assessing whether or not non-stationary regions should be included in the analysis. The generalisation of the Berger-Yang (1994) paradigm to general values of p seems to put too much mass on the non-stationary region, even when using a symmetrisation technique… I came out of the meeting (exhausted and) wondering whether or not it was at all meaningfull to consider testing for stationarity, even though Bayes factors can be constructed in this setting.