## freedom prior

Posted in Books, Kids, Statistics with tags , , , , , on December 9, 2020 by xi'an

Another X validated question on which I spent more time than expected. Because of the somewhat unusual parameterisation used in BDA.for the inverse χ² distribution. The interest behind the question is in the induced distribution on the parameter associated with the degrees of freedom ν of the t-distribution (question that coincided with my last modifications of my undergraduate mathematical statistics exam, involving a t sample). Whichever the prior chosen on ν, the posterior involves a nasty term

$\pi(\nu)\frac{(\nu)^{n\nu/2}}{\Gamma(\nu/2)^n}{\,(v_1\cdots v_n)^{-\nu/2-1}\exp\Big\{-\nu\sigma^2}\sum_{i=1}^n1\big/2v_i\Big\}$

as the Gamma function there is quickly explosive (as can be checked Stirling’s formula). Unless the prior π(ν) cancels this term, which is rather fishy as the prior would then depend on the sample size n. Even though the whole posterior is well-defined (and hence non-explosive). Rather than seeking a special prior π(ν) for computation purposes, I would thus favour a modelling restricted to integer valued ν’s as there is not much motivation in inferring about non-integer degrees of freedom.

## a null hypothesis with a 99% probability to be true…

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on March 28, 2018 by xi'an

When checking the Python t distribution random generator, np.random.standard_t(), I came upon this manual page, which actually does not explain how the random generator works but spends instead the whole page to recall Gosset’s t test, illustrating its use on an energy intake of 11 women, but ending up misleading the readers by interpreting a .009 one-sided p-value as meaning “the null hypothesis [on the hypothesised mean] has a probability of about 99% of being true”! Actually, Python’s standard deviation estimator x.std() further returns by default a non-standard standard deviation, dividing by n rather than n-1…

## adaptive Metropolis-Hastings sampling using reversible dependent mixture proposals

Posted in Statistics with tags , , , , , on May 23, 2013 by xi'an

In the plane to Birmingham, I was reading this recent arXived paper by Minh-Ngoc Tran, Michael K. Pitt, and Robert Kohn. The adaptive structure of their ACMH algorithm is based upon two parallel Markov chains, the former (called the trial chain) feeding the proposal densities of the later (called the main chain), bypassing the more traditional diminishing adaptation conditions. (Even though convergence actually follows from a minorisation condition.) These proposals are mixtures of t distributions fitted by variational Bayes approximations. Furthermore, the proposals are (a) reversible and (b) mixing local [dependent] and global [independent] components. One nice aspect of the reversibility is that the proposals do not have to be evaluated at each step.

The convergence results in the paper indeed assume a uniform minorisation condition on all proposal densities: although this sounded restrictive at first (but allows for straightforward proofs), I realised this could be implemented by adding a specific component to the mixture as in Corollary 3. (I checked the proof to realise that the minorisation on the proposal extends to the minorisation on the Metropolis-Hastings transition kernel.) A reversible kernel is defined as satisfying the detailed balance condition, which means that a single Gibbs step is reversible even though the Gibbs sampler as a whole is not. If a reversible Markov kernel with stationary distribution ζ is used, the acceptance probability in the Metropolis-Hastings transition is

α(x,z) = min{1,π(z)ζ(x)/π(x)ζ(z)}

(a result I thought was already known). The sweet deal is that the transition kernel involves Dirac masses, but the acceptance probability bypasses the difficulty. The way mixtures of t distributions can be reversible follows from Pitt & Walker (2006) construction, with  ζ  a specific mixture of t distributions. This target is estimated by variational Bayes. The paper further bypasses my classical objection to the use of normal, t or mixtures thereof, distributions:  this modelling assumes a sort of common Euclidean space for all components, which is (a) highly restrictive and (b) very inefficient in terms of acceptance rate. Instead, Tran & al. resort to Metropolis-within-Gibbs by constructing a partition of the components into subgroups.

## Typo in Example 3.6

Posted in Books, R, Statistics with tags , , , on September 17, 2010 by xi'an

Edward Kao pointed out the following difficulty about Example 3.6 in Chapter 3 of “Introducing Monte Carlo Methods with R”:

I have two questions that have puzzled me for a while. I hope you can shed some lights. They are all about Example 3.6 of your book.

1. On page 74, there is a term x(1-x) for m(x). This is fine. But the term disappeared from (3.5) on p.75. My impression is that this is not a typo. There must be a reason for its disappearance. Can you elaborate?

I am alas afraid this is a plain typo, where I did not report the x(1-x) from one page to the next.

2. On page 75, you have the term “den=dt(normx,3)”. My impression is that you are using univariate t with 3 degrees of freedom to approximate. I thought formally you need to use a bivariatet with 3 degrees of freedom to do the importance sampling. Why would normx=sqrt(x[,1]^2+x[,2]^2) along with a univariate t work?

This is a shortcut that would require more explanation. While the two-dimensional t sample is y, a linear transform of the isotonic x, it is possible to express the density of y via the one-dimensional t density, hence the apparent confusion between univariate and bivariate t densities…

## New typos in Monte Carlo Statistical Methods

Posted in Books, Statistics with tags , , on September 28, 2009 by xi'an

Three weeks ago, I got this email from Amir Alipour, an Iranian student, about typos in Monte Carlo Statistical Method:

“I found some typos in the book which were not reported at your website. I list them blow, I would appreciate if you let me know if I`m right.
1.       Page 4, line 9,  $(\theta_1,\ldots,\theta_n,p_1,\ldots,p_n)$, the index should not be $k$ instead of $n$?
2.       Page 4, example 1.3, last line, $n>q$, should be $n>=q$ (as we have $x_0$ ).
3.       Page 5, the likelihood of MA(q), it seems $\sigma^{-(n+q)}$ should  change to  $\sigma^{-(n+q+1)}$.
4.       Page 8, formula (1.10).  The gradient symbol $\nabla$ is used for the first time without introducing, while it is used for the second time on page 19 with introducing.
5.       Page 8, Example 1.6, the log part in $\psi(\theta)$, should  change to $\log(-1/(2\theta_2))$.
6.       Page 10, in modified Bessel function, $z$ should change to $t$.
7.       Page 10, Example 1.9, in the likelihood function, the power of $\sigma$ should be $-n$, and the power of the function under product should be $-\frac{p+1}{2}$. (Even Figure 1.1 is not consistent with likelihood)”

and I have posted those new typos on the associated webpage. Amir Alipour has thus managed to find seven yet undiscovered typos in the first ten pages of the book! I am quite grateful to Amir Alipour for signaling those typos. Especially the final one which is due to an intented presentation of the $t$ density as a polynomial, with a poor wording: the likelihood of a $t$ sample is proportional to a power of a polynomial in the location parameter. (And there still is a typo since $\sigma^{n(p+1)/2}$ should be $\sigma^{2n/(p+1)}$…) Now I can only hope Amir Alipour can proceed through the whole book with the same amount of dedication!