## stratified MCMC

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on December 3, 2020 by xi'an When working last week with a student, we came across [the slides of a talk at ICERM by Brian van Koten about] a stratified MCMC method whose core idea is to solve a eigenvector equation z’=z’F associated with the masses of “partition” functions Ψ evaluated at the target. (The arXived paper is also available since 2017 but I did not check it in more details.)Although the “partition” functions need to overlap for the matrix not to be diagonal (actually the only case that does not work is when these functions are truly indicator functions). As in other forms of stratified sampling, the practical difficulty is in picking the functions Ψ so that the evaluation of the terms of the matrix F is not overly impacted by the Monte Carlo error. If spending too much time in estimating these terms, there is not a clear gain in switching to stratified sampling, which may be why it is not particularly developed in the MCMC literature…. As an interesting aside, the illustration in this talk comes from the Mexican stamp thickness data I also used in my earlier mixture papers, concerning the 1872 Hidalgo issue that was printed on different qualities of paper. This makes the number k of components somewhat uncertain, although k=3 is sometimes used as a default. Hence a parameter and simulation space of dimension 8, even though the method is used toward approximating the marginal posteriors on the weights λ¹ and λ².

## too many marginals

Posted in Kids, Statistics with tags , , , , , , , on February 3, 2020 by xi'an This week, the CEREMADE coffee room puzzle was about finding a joint distribution for (X,Y) such that (marginally) X and Y are both U(0,1), while X+Y is U(½,1+½). Beyond the peculiarity of the question, there is a larger scale problem, as to how many (if any) compatible marginals h¹(X,Y), h²(X,Y), h³(X,Y), …, need one constrains the distribution to reconstruct the joint. And wondering if any Gibbs-like scheme is available to simulate the joint.

## tractable Bayesian variable selection: beyond normality

Posted in R, Statistics, University life with tags , , , , , , , on October 17, 2016 by xi'an David Rossell and Francisco Rubio (both from Warwick) arXived a month ago a paper on non-normal variable selection. They use two-piece error models that preserve manageable inference and allow for simple computational algorithms, but also characterise the behaviour of the resulting variable selection process under model misspecification. Interestingly, they show that the existence of asymmetries or heavy tails leads to power losses when using the Normal model. The two-piece error distribution is made of two halves of location-scale transforms of the same reference density on the two sides of the common location parameter. In this paper, the density is either Gaussian or Laplace (i.e., exponential?). In both cases the (log-)likelihood has a nice compact expression (although it does not allow for a useful sufficient statistic). One is the L¹ version versus the other which is the L² version. Which is the main reason for using this formalism based on only two families of parametric distributions, I presume. (As mentioned in an earlier post, I do not consider those distributions as mixtures because the component of a given observation can always be identified. And because as shown in the current paper, maximum likelihood estimates can be easily derived.) The prior construction follows the non-local prior principles of Johnson and Rossell (2010, 2012) also discussed in earlier posts. The construction is very detailed and hence highlights how many calibration steps are needed in the process.

“Bayes factor rates are the same as when the correct model is assumed [but] model misspecification often causes a decrease in the power to detect truly active variables.”

When there are too many models to compare at once, the authors propose a random walk on the finite set of models (which does not require advanced measure-theoretic tools like reversible jump MCMC). One interesting aspect is that moving away from the normal to another member of this small family is driven by the density of the data under the marginal densities, which means moving only to interesting alternatives. But also sticking to the normal only for adequate datasets. In a sense this is not extremely surprising given that the marginal likelihoods (model-wise) are available. It is also interesting that on real datasets, one of the four models is heavily favoured against the others, be it Normal (6.3) or Laplace (6.4). And that the four model framework returns almost identical values when compared with a single (most likely) model. Although not immensely surprising when acknowledging that the frequency of the most likely model is 0.998 and 0.998, respectively.

“Our framework represents a middle-ground to add flexibility in a parsimonious manner that remains analytically and computationally tractable, facilitating applications where either p is large or n is too moderate to fit more flexible models accurately.”

Overall, I find the experiment quite conclusive and do not object [much] to this choice of parametric family in that it is always more general and generic than the sempiternal Gaussian model. That we picked in our Bayesian Essentials, following tradition. In a sense, it would be natural to pick the most general possible parametric family that allows for fast computations, if this notion does make any sense…