## simulation fodder for future exams

Posted in Books, Kids, R, Statistics with tags , , , , on February 20, 2019 by xi'an

Here are two nice exercises for a future simulation exam, seen and solved on X validated.The first one is about simulating a Gibbs sampler associated with the joint target

exp{-|x|-|y|-a|y-x|}

defined over IR² for a≥0 (or possibly a>-1). The conditionals are identical and non-standard, but a simple bound on the conditional density is the corresponding standard double exponential density, which makes for a straightforward accept-reject implementation. However it is also feasible to break the full conditional into three parts, depending on the respective positions of x, y, and 0, and to obtain easily invertible cdfs on the three intervals.The second exercise is about simulating from the cdf

$F(x)=1-\exp\{-ax-bx^{p+1}/(p+1)\}$

which can be numerically inverted. It is however more fun to call for an accept-reject algorithm by bounding the density with a ½ ½ mixture of an Exponential Exp(a) and of the 1/(p+1)-th power of an Exponential Exp(b/(p+1)). Since no extra constant appears in the solution,  I suspect the (p+1) in b/(p+1) was introduced on purpose. As seen in the above fit for 10⁶ simulations (and a=1,b=2,p=3), there is no deviation from the target! There is nonetheless an even simpler and rather elegant resolution to the exercise: since the tail function (1-F(x)) appears as the product of two tail functions, exp(-ax) and the other one, the cdf is the distribution of the minimum of two random variates, one with the Exp(a) distribution and the other one being the 1/(p+1)-th power of an Exponential Exp(b/(p+1)) distribution. Which of course returns a very similar histogram fit:

## a new method to solve the transformation of calculus

Posted in Statistics with tags , , , , , , on December 23, 2018 by xi'an

An hilariously ridiculous email I just received (warning: book cover unrelated):

Good day! this is very important to the “Mathematics” and the related fields,
“The Simulator”,“Probability theory”,”Statistics”,”Numerical Analysis”,
“Cryptography”,“Data mining”,“The big data analysis”and“Artificial Intelligence”.
The transformation of random variables in Calculus is very difficult and sometimes
is impossible to be done. The simulator can get the accuracy and precise simulated data
and the database could be the probability distributution if the data size is 100,000,000
or more. The probabilistic model can be designed and getting the probability distribution
and the coefficient using the simulator.

(1)“The Simulator” has four methods,
1) the basic method is the inverse function of the distribution function,
2) the transformation method to get the simulated data,
3) the numerical analysis method to build the simulated database,
4) the simulated database and the estimated line of random variable to get the simulated data.
(2) “Probability Theory” can use the simulator to a tool.
(3) ”Statistics”, the sampling distribution of the point estimator and the test statistic
can be seen as the transformation equation and the critical point and p value is from
the sampling distribution.
(4) ”Numerical Analysis”, the simulator data can overcome the limit of numerical analysis,
the number of random variables could be more 10000.
(5) “Cryptography”, the simulator of the probabilistic model will derive the lock code
which cannot be unlocked.
(6) “Data mining”, the data set can be a specific probability distribution using
“goodness of fit” or “Curve-fitting” or “Curvilinear”.
1) “goodness of fit”, there are 45 distributions for the null hypothesis.
2) “Curve-fitting”, the estimated line of random variable and the estimated line
of the distribution function.
3) “Curvilinear”, the data set is not arithmetic series.
(7) “The big data analysis”, the number of random variables could be more 10000
about the simulator of the probabilistic model.
(8) “Artificial Intelligence”, the model after analysis can be the transformation
equation, the simulator of the probabilistic model can get the simulated data.

The first book name is “The simulator” will be public, the context contains
(1) The simulation methods,
(2)“Probability Theory”,
(3) ”Statistics” and how to write the statistical package even the population is not
Normal distribution or a special statistical model.
(4)“Cryptography”,
(5)“Explored the arithmetic data”,

## ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on October 30, 2018 by xi'an

A ziggurat (Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels. Wikipedia

In a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the  conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.

## calibrating approximate credible sets

Posted in Books, Statistics with tags , , , , , , , on October 26, 2018 by xi'an

Earlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. (Warning: all three authors are good friends of mine!) They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.

## rethinking the ESS

Posted in Statistics with tags , , , , , , , , , on September 14, 2018 by xi'an

Following Victor Elvira‘s visit to Dauphine, one and a half year ago, where we discussed the many defects of ESS as a default measure of efficiency for importance sampling estimators, and then some more efforts (mostly from Victor!) to formalise these criticisms, Victor, Luca Martino and I wrote a paper on this notion, now arXived. (Victor most kindly attributes the origin of the paper to a 2010 ‘Og post on the topic!) The starting thread of the (re?)analysis of this tool introduced by Kong (1992) is that the ESS used in the literature is an approximation to the “true” ESS, generally unavailable. Approximation that is pretty crude and hence impacts the relevance of using it as the assessment tool for comparing importance sampling methods. In the paper, we re-derive (with the uttermost precision) the resulting approximation and list the many assumptions that [would] validate this approximation. The resulting drawbacks are many, from the absurd property of always being worse than direct sampling, to being independent from the target function and from the sample per se. Since only importance weights matter. This list of issues is not exactly brand new, but we think it is worth signaling given the fact that this approximation has been widely used in the last 25 years, due to its simplicity, as a practical rule of thumb [!] in a wide variety of importance sampling methods. In continuation of the directions drafted in Martino et al. (2017), we also indicate some alternative notions of importance efficiency. Note that this paper does not cover the use of ESS for MCMC algorithms, where it is somewhat more legit, if still too rudimentary to really catch convergence or lack thereof! [Note: I refrained from the post title resinking the ESS…]

## hitting a wall

Posted in Books, Kids, R, Statistics, University life with tags , , , , , on July 5, 2018 by xi'an

Once in a while, or a wee bit more frequently (!), it proves impossible to communicate with a contributor of a question on X validated. A recent instance was about simulating from a multivariate kernel density estimate where the kernel terms at x¹,x²,… are Gaussian kernels applied to the inverses of the norms |x-x¹|, |x-x²|,… rather than to the norms as in the usual formulation. The reason for using this type of kernel is unclear, as it certainly does not converge to an estimate of the density of the sample x¹,x²,…  as the sample size grows, since it excludes a neighbourhood of each point in the sample. Since the kernel term tends to a non-zero constant at infinity, the support of the density estimate is restricted to the hypercube [0,1]x…x[0,1], again with unclear motivations. No mention being made of the bandwidth adopted for this kernel. If one takes this exotic density as a given, the question is rather straightforward as the support is compact, the density bounded and a vanilla accept-reject can be implemented. As illustrated by the massive number of comments on that entry, it did not work as the contributor adopted a fairly bellicose attitude about suggestions from moderators on that site and could not see the point in our requests for clarification, despite plotting a version of the kernel that had its maximum [and not its minimum] at x¹… After a few attempts, including writing a complete answer, from which the above graph is taken (based on an initial understanding of the support being for (x-x¹), …), I gave up and deleted all my entries.On that question.

## best unbiased estimator of θ² for a Poisson model

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on May 23, 2018 by xi'an

A mostly traditional question on X validated about the “best” [minimum variance] unbiased estimator of θ² from a Poisson P(θ) sample leads to the Rao-Blackwell solution

$\mathbb{E}[X_1X_2|\underbrace{\sum_{i=1}^n X_i}_S=s] = -\frac{s}{n^2}+\frac{s^2}{n^2}=\frac{s(s-1)}{n^2}$

and a similar estimator could be constructed for θ³, θ⁴, … With the interesting limitation that this procedure stops at the power equal to the number of observations (minus one?). But,  since the expectation of a power of the sufficient statistics S [with distribution P(nθ)] is a polynomial in θ, there is de facto no limitation. More interestingly, there is no unbiased estimator of negative powers of θ in this context, while this neat comparison on Wikipedia (borrowed from the great book of counter-examples by Romano and Siegel, 1986, selling for a mere \$180 on amazon!) shows why looking for an unbiased estimator of exp(-2θ) is particularly foolish: the only solution is (-1) to the power S [for a single observation]. (There is however a first way to circumvent the difficulty if having access to an arbitrary number of generations from the Poisson, since the Forsythe – von Neuman algorithm allows for an unbiased estimation of exp(-F(x)). And, as a second way, as remarked by Juho Kokkala below, a sample of at least two Poisson observations leads to a more coherent best unbiased estimator.)