Archive for Bayesian Analysis

X entropy

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , on November 16, 2018 by xi'an

Another discussion on X validated related to the maximum entropy priors and their dependence on the dominating measure μ chosen to define the  score. With the same electrical engineering student as previously. In the wee hours at Casa Matematicà Oaxaca. As I took the [counter-]example of a Lebesgue dominating measure versus a Normal density times the Lebesgue measure producing the same maximum entropy distribution [with obviously the same density wrt to the Lebesgue measure] when the constraints involve the second moment, this confused the student and I spent some time constructing another example with different outcomes, when the Lebesgue measure versus the [artificial] dx/√|x| measure.

I am actually surprised at how limited the discussion of that point occurs in the literature (or at least in my googling attempt). Just a mention made in Bayesian Analysis in Statistics and Econometrics.

optimal proposal for ABC

Posted in Statistics with tags , , , , , , , , , , on October 8, 2018 by xi'an

As pointed out by Ewan Cameron in a recent c’Og’ment, Justin Alsing, Benjamin Wandelt, and Stephen Feeney have arXived last August a paper where they discuss an optimal proposal density for ABC-SMC and ABC-PMC. Optimality being understood as maximising the effective sample size.

“Previous studies have sought kernels that are optimal in the (…) Kullback-Leibler divergence between the proposal KDE and the target density.”

The effective sample size for ABC-SMC is actually the regular ESS multiplied by the fraction of accepted simulations. Which surprisingly converges to the ratio

E[q(θ)/π(θ)|D]/E[π(θ)/q(θ)|D]

under the (true) posterior. (Where q(θ) is the importance density and π(θ) the prior density.] When optimised in q, this usually produces an implicit equation which results in a form of geometric mean between posterior and prior. The paper looks at approximate ways to find this optimum. Especially at an upper bound on q. Something I do not understand from the simulations is that the starting point seems to be the plain geometric mean between posterior and prior, in a setting where the posterior is supposedly unavailable… Actually the paper is silent on how the optimal can be approximated in practice, for the very reason I just mentioned. Apart from using a non-parametric or mixture estimate of the posterior after each SMC iteration, which may prove extremely costly when processed through the optimisation steps. However, an interesting if side outcome of these simulations is that the above geometric mean does much better than the posterior itself when considering the effective sample size.

off to Vancouver

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 29, 2018 by xi'an

I am off today to Vancouver for JSM2018, eight years after I visited the West Coast for another JSM! And a contender for the Summer of British Conferences, since it is in British Columbia.

And again looking forward the city, (some of) the meeting, and getting together with long-time-no-see friends. Followed by a fortnight of vacations on Vancouver Island where ‘Og posting may get sparse…

I hope I can take advantage of the ten hours in the plane from Paris to write my talk from scratch about priors for mixtures of distributions. Based on our papers with Clara Grazian and with Kaniav Kamary and Kate Lee. Still having some leeway since my talk is on Thursday morning, on the last day of the meeting…

new estimators of evidence

Posted in Books, Statistics with tags , , , , , , , , , , , , on June 19, 2018 by xi'an

In an incredible accumulation of coincidences, I came across yet another paper about evidence and the harmonic mean challenge, by Yu-Bo Wang, Ming-Hui Chen [same as in Chen, Shao, Ibrahim], Lynn Kuo, and Paul O. Lewis this time, published in Bayesian Analysis. (Disclaimer: I was not involved in the reviews of any of these papers!)  Authors who arelocated in Storrs, Connecticut, in geographic and thematic connection with the original Gelfand and Dey (1994) paper! (Private joke about the Old Man of Storr in above picture!)

“The working parameter space is essentially the constrained support considered by Robert and Wraith (2009) and Marin and Robert (2010).”

The central idea is to use a more general function than our HPD restricted prior but still with a known integral. Not in the sense of control variates, though. The function of choice is a weighted sum of indicators of terms of a finite partition, which implies a compact parameter set Ω. Or a form of HPD region, although it is unclear when the volume can be derived. While the consistency of the estimator of the inverse normalising constant [based on an MCMC sample] is unsurprising, the more advanced part of the paper is about finding the optimal sequence of weights, as in control variates. But it is also unsurprising in that the weights are proportional to the inverses of the inverse posteriors over the sets in the partition. Since these are hard to derive in practice, the authors come up with a fairly interesting alternative, which is to take the value of the posterior at an arbitrary point of the relevant set.

The paper also contains an extension replacing the weights with functions that are integrable and with known integrals. Which is hard for most choices, even though it contains the regular harmonic mean estimator as a special case. And should also suffer from the curse of dimension when the constraint to keep the target almost constant is implemented (as in Figure 1).

The method, when properly calibrated, does much better than harmonic mean (not a surprise) and than Petris and Tardella (2007) alternative, but no other technique, on toy problems like Normal, Normal mixture, and probit regression with three covariates (no Pima Indians this time!). As an aside I find it hard to understand how the regular harmonic mean estimator takes longer than this more advanced version, which should require more calibration. But I find it hard to see a general application of the principle, because the partition needs to be chosen in terms of the target. Embedded balls cannot work for every possible problem, even with ex-post standardisation.

 

the [not so infamous] arithmetic mean estimator

Posted in Books, Statistics with tags , , , , , , , , , on June 15, 2018 by xi'an

“Unfortunately, no perfect solution exists.” Anna Pajor

Another paper about harmonic and not-so-harmonic mean estimators that I (also) missed came out last year in Bayesian Analysis. The author is Anna Pajor, whose earlier note with Osiewalski I also spotted on the same day. The idea behind the approach [which belongs to the branch of Monte Carlo methods requiring additional simulations after an MCMC run] is to start as the corrected harmonic mean estimator on a restricted set A as to avoid tails of the distributions and the connected infinite variance issues that plague the harmonic mean estimator (an old ‘Og tune!). The marginal density p(y) then satisfies an identity involving the prior expectation of the likelihood function restricted to A divided by the posterior coverage of A. Which makes the resulting estimator unbiased only when this posterior coverage of A is known, which does not seem realist or efficient, except if A is an HPD region, as suggested in our earlier “safe” harmonic mean paper. And efficient only when A is well-chosen in terms of the likelihood function. In practice, the author notes that P(A|y) is to be estimated from the MCMC sequence and that the set A should be chosen to return large values of the likelihood, p(y|θ), through importance sampling, hence missing somehow the double opportunity of using an HPD region. Hence using the same default choice as in Lenk (2009), an HPD region which lower bound is derived as the minimum likelihood in the MCMC sample, “range of the posterior sampler output”. Meaning P(A|y)=1. (As an aside, the paper does not produce optimality properties or even heuristics towards efficiently choosing the various parameters to be calibrated in the algorithm, like the set A itself. As another aside, the paper concludes with a simulation study on an AR(p) model where the marginal may be obtained in closed form if stationarity is not imposed, which I first balked at, before realising that even in this setting both the posterior and the marginal do exist for a finite sample size, and hence the later can be estimated consistently by Monte Carlo methods.) A last remark is that computing costs are not discussed in the comparison of methods.

The final experiment in the paper is aiming at the marginal of a mixture model posterior, operating on the galaxy benchmark used by Roeder (1990) and about every other paper on mixtures since then (incl. ours). The prior is pseudo-conjugate, as in Chib (1995). And label-switching is handled by a random permutation of indices at each iteration. Which may not be enough to fight the attraction of the current mode on a Gibbs sampler and hence does not automatically correct Chib’s solution. As shown in Table 7 by the divergence with Radford Neal’s (1999) computations of the marginals, which happen to be quite close to the approximation proposed by the author. (As an aside, the paper mentions poor performances of Chib’s method when centred at the posterior mean, but this is a setting where the posterior mean is meaningless because of the permutation invariance. As another, I do not understand how the RMSE can be computed in this real data situation.) The comparison is limited to Chib’s method and a few versions of arithmetic and harmonic means. Missing nested sampling (Skilling, 2006; Chopin and X, 2011), and attuned importance sampling as in Berkoff et al. (2003), Marin, Mengersen and X (2005), and the most recent Lee and X (2016) in Bayesian Analysis.

StanCon in Helsinki [29-31 Aug 2018]

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , on March 7, 2018 by xi'an

As emailed to me by Aki Vehtari, the next StanCon will take place this summer in the wonderful city of Helsinki, at the end of August. On Aalto University Töölö Campus precisely. The list of speakers and tutorial teachers is available on the webpage. (The only “negative point” is that the conference does not include a Tuesday, the night of the transcendence 2 miles race!) Somewhat concluding this never-ending summer of Bayesian conferences!