## arbitrary distributions with set correlation

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , , , on May 11, 2015 by xi'an

A question recently posted on X Validated by Antoni Parrelada: given two arbitrary cdfs F and G, how can we simulate a pair (X,Y) with marginals  F and G, and with set correlation ρ? The answer posted by Antoni Parrelada was to reproduce the Gaussian copula solution: produce (X’,Y’) as a Gaussian bivariate vector with correlation ρ and then turn it into (X,Y)=(F⁻¹(Φ(X’)),G⁻¹(Φ(Y’))). Unfortunately, this does not work, because the correlation does not keep under the double transform. The graph above is part of my answer for a χ² and a log-Normal cdf for F amd G: while corr(X’,Y’)=ρ, corr(X,Y) drifts quite a  lot from the diagonal! Actually, by playing long enough with my function

tacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm)
{
x1=rnorm(nsim);x2=rnorm(nsim)
coeur=rho
rho2=sqrt(1-rho^2)
for (t in 1:length(rho)){
y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2))
coeur[t]=cor(fx(y[,1]),fy(y[,2]))}
return(coeur)
}


Playing further, I managed to get an almost flat correlation graph for the admittedly convoluted call

tacor(seq(-1,1,.01),
fx=function(x) qchisq(x^59,df=.01),
fy=function(x) qlogis(x^59))


Now, the most interesting question is how to produce correlated simulations. A pedestrian way is to start with a copula, e.g. the above Gaussian copula, and to twist the correlation coefficient ρ of the copula until the desired correlation is attained for the transformed pair. That is, to draw the above curve and invert it. (Note that, as clearly exhibited by the graph just above, all desired correlations cannot be achieved for arbitrary cdfs F and G.) This is however very pedestrian and I wonder whether or not there is a generic and somewhat automated solution…

## a weird beamer feature…

Posted in Books, Kids, Linux, R, Statistics, University life with tags , , , , , , , , , , , , on September 24, 2014 by xi'an

As I was preparing my slides for my third year undergraduate stat course, I got a weird error that got a search on the Web to unravel:

! Extra }, or forgotten \endgroup.
\endframe ->\egroup
\begingroup \def \@currenvir {frame}
l.23 \end{frame}
\begin{slide}
?


which was related with a fragile environment

\begin{frame}[fragile]
\frametitle{simulation in practice}
\begin{itemize}
\item For a given distribution $F$, call the corresponding
pseudo-random generator in an arbitrary computer language
\begin{verbatim}
> x=rnorm(10)
> x
[1] -0.021573 -1.134735  1.359812 -0.887579
[7] -0.749418  0.506298  0.835791  0.472144
\end{verbatim}
\item use the sample as a statistician would
\begin{verbatim}
> mean(x)
[1] 0.004892123
> var(x)
[1] 0.8034657
\end{verbatim}
to approximate quantities related with $F$
\end{itemize}
\end{frame}\begin{frame}


but not directly the verbatim part: the reason for the bug was that the \end{frame} command did not have a line by itself! Which is one rare occurrence where the carriage return has an impact in LaTeX, as far as I know… (The same bug appears when there is an indentation at the beginning of the line. Weird!) [Another annoying feature is wordpress turning > into &gt; in the sourcecode environment…]

## Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

## Random generators for parallel processing

Posted in R, Statistics with tags , , , , on October 28, 2010 by xi'an

Given the growing interest in parallel processing through GPUs or multiple processors, there is a clear need for a proper use of (uniform) random number generators in this environment. We were discussing the issue yesterday with Jean-Michel Marin and briefly looked at a few solutions: Continue reading

## StatProb [wiki]

Posted in R, Statistics with tags , , , , , , , , on August 1, 2010 by xi'an

Via the [financial and technical] support of Springer, probability and statistics societies are launching a specialised wiki called StatProb. It operates as a wiki in that authors can submit short articles on any topic, with further co-authors joining in later to improve those articles, but with the contents guaranteed via the filter of an editorial board. The members of the board and subsequent associate editors are nominated by the statistical societies involved in the project. (For instance, I was nominated by the Royal Statistical Society., Susie Bayarri by ISBA, George Casella by the ASA, etc.) As a starting basis, StatProb will reproduce a few hundred entries from the incoming International Encyclopedia of Statistical Sciences edited by Miodrag Lovric (to which I contributed). Obviously, the wiki will only work if enough contributors submit their piece and make StatProb a reference for statistics. I joined the project because, as opposed to costly encyclopedias, wikis are living things that evolve with the field (if enough activity is maintained by its members) and that can be accessed freely by all. Another good thing about StatProb is that entries are submitted in LaTeX, making the output looking fairly reasonnable. (To start the ball rolling, we submitted this short piece on random number generation with George Casella, exctacted from an older piece that had been sitting around for a while. It does not mean to be the only piece on random number generation, nor on MCMC or Monte Carlo methods. And it can be updated and augmented as in other wikis.) Unless I am confused, I think the site will be officially launched at JSM 2010 in Vancouver this weekend.