Archive for Monte Carlo

future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on September 29, 2014 by xi'an

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision…  A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics

Adap’skiii [day 2]

Posted in R, Statistics, University life with tags , , , , , , , , , on January 5, 2011 by xi'an

Another exciting day at Adap’skiii!!!

Yves Atchadé presented a very recent work on the fundamental issue of estimating the asymptotic variance estimation for adaptive MCMC algorithms, with an intriguing experimental observation that a non-converging bandwidth with rate 1/n was providing better coverage than the converging rate. (I always found the issue of estimating the asymptotic variance both a tough problem and an important item in convergence assessment.) Galin Jones showed new regeneration results for componentwise MCMC samplers, with applications to quantile estimation. The iid structure produced by the regeneration mechanism allows rather naturally to introduce an adaptive improvement in those algorithms, if regeneration occurs often enough. (From the days of my Stat’Sci’ paper on convergence assessment, I  love regeneration techniques for both theoretical and methodological reasons, even though they are often difficult to efficiently implement in practice.) Matti Vihola summarised several of his recent papers on the stability and convergence of adaptive MCMC algorithms, pursuing the Finnish tradition of leadership in adaptive algorithms! One point I found particularly interesting was the possibility of separating ergodicity from the Law of Large Numbers, thus reducing the constraints imposed by the containment condition. In the afternoon, Dawn Woodard discussed the convergence rate of the Gibbs sampler used for genomic motif discovery by Liu, Lawrence and Neuwald (1995). Scott Schmidler concluded the workshop by a far-ranging talk distinguishing between exploration and exploitation in adaptive MCMC algorithms, ie mixing vs burning, with illustrations using the Wang-Landau algorithm.

Thus, as in the previous editions of Adap’ski, we have had a uniformly high quality of talks about the current research in the area of adaptive algorithms (and a wee further). This shows the field is very well active and expanding, aiming at reaching a wider audience by providing verifiable convergence conditions and semi-automated softwares (like Jeff Rosenthal’s amcmc R code we used in Introducing Monte Carlo Methods with R). Looking forward Adap’ski 4 (Adap’skiV?!), hopefully in Europe and why not in Chamonix?! Which could then lead us to call the next meeting Adap’skiX…

Random sudokus [p-values]

Posted in R, Statistics with tags , , , , , , , on May 21, 2010 by xi'an

I reran the program checking the distribution of the digits over 9 “diagonals” (obtained by acceptable permutations of rows and column) and this test again results in mostly small p-values. Over a million iterations, and the nine (dependent) diagonals, four p-values were below 0.01, three were below 0.1, and two were above (0.21 and 0.42). So I conclude in a discrepancy between my (full) sudoku generator and the hypothesised distribution of the (number of different) digits over the diagonal. Assuming my generator is a faithful reproduction of the one used in the paper by Newton and DeSalvo, this discrepancy suggests that their distribution over the sudoku grids do not agree with this diagonal distribution, either because it is actually different from uniform or, more likely, because the uniform distribution I use over the (groups of three over the) diagonal is not compatible with a uniform distribution over all sudokus…

ACM Transactions on Modeling and Computer Simulation

Posted in Books, R, Statistics, University life with tags , , , on May 21, 2010 by xi'an

Pierre Lecuyer is the new editor of the ACM Transactions on Modeling and Computer Simulation (TOMACS) and he has asked me to become an Area Editor for the new area of simulation in Statistics. I am quite excited by this new Æditor’s hat, since this is a cross-disciplinary journal:

The ACM Transactions on Modeling and Computer Simulation (TOMACS) provides a single archival source for the publication of high-quality research and developmental results in computer simulation. The subjects of emphasis are discrete event simulation, combined discrete and continuous simulation, as well as Monte Carlo methods. Papers in continuous simulation will also receive serious consideration if their contributions to modeling and simulation in general are substantial.

The use of simulation techniques is pervasive, extending to virtually all the sciences. TOMACS serves to enhance the understanding, improve the practice, and increase the utilization of computer simulation. Submissions should contribute to the realization of these objectives, and papers treating applications should stress their contributions vis-a-vis these objectives.

As an indication of this cross-disciplinarity, I note that most Area Editors and Associate Editors are unknown to me (except for Luc Devroye, of course!). In addition, I savour the irony of being associated with a journal of the Association for Computer Machinery (ACM), given my complete lack of practical skills! So, if you have relevant papers to submit in the field, please consider the ACM Transactions on Modeling and Computer Simulation (TOMACS) as a possible outlet.

Random [uniform?] sudokus [corrected]

Posted in R, Statistics with tags , , , , , on May 19, 2010 by xi'an

As the discrepancy [from 1] in the sum of the nine probabilities seemed too blatant to be attributed to numerical error given the problem scale, I went and checked my R code for the probabilities and found a choose(9,3) instead of a choose(6,3) in the last line… The fit between the true distribution and the observed frequencies is now much better

but the chi-square test remains suspicious of the uniform assumption (or again of my programming abilities):

> chisq.test(obs,p=pdiag)
Chi-squared test for given probabilities
data:  obs
X-squared = 16.378, df = 6, p-value = 0.01186

since a p-value of 1% is a bit in the far tail of the distribution.

Random [uniform?] sudokus

Posted in R, Statistics with tags , , , , , , on May 19, 2010 by xi'an

A longer run of the R code of yesterday with a million sudokus produced the following qqplot.

It does look ok but no perfect. Actually, it looks very much like the graph of yesterday, although based on a 100-fold increase in the number of simulations. Now, if I test the adequation with a basic chi-square test (!), the result is highly negative:

> chisq.test(obs,p=pdiag/sum(pdiag)) #numerical error in pdiag
Chi-squared test for given probabilities
data:  obs
X-squared = 6978.503, df = 6, p-value < 2.2e-16

(there are seven entries for both obs and pdiag, hence the six degrees of freedom). So this casts a doubt upon the uniformity of the random generator suggested in the paper by Newton and DeSalvo or rather on my programming abilities, see next post!

Random sudokus [test]

Posted in R, Statistics with tags , , , , , , on May 18, 2010 by xi'an

Robin Ryder pointed out to me that 3 is indeed the absolute minimum one could observe because of the block constraint (bon sang, mais c’est bien sûr !). The distribution of the series of 3 digits being independent over blocks, the theoretical distribution under uniformity can easily be simulated:

#uniform distribution on the block diagonal
sheik=rep(0,9)
for (t in 1:10^6){
group=length(unique(c(sample(1:9,3),sample(1:9,3),sample(1:9,3))))
sheik[group]=sheik[group]+1
}

and it produces a result that is close enough to the one observed with the random sudoku generator. Actually, the exact distribution is available as (corrected on May 19!)

pdiag=c(1, #k=3
(3*6+3*6*4), #k=4
(3*choose(6,2)+3*6*5*choose(4,2)+3*choose(5,3)*choose(6,2)), #k=5
(choose(6,3)+3*6*choose(5,2)*4+3*choose(6,2)*choose(5,2)*4+
choose(6,3)*choose(6,3)),#k=6
(choose(3,2)*6*choose(5,3)+3*choose(6,2)*choose(4,2)*5+
choose(6,3)*choose(6,2)*3), #k=7
(3*choose(6,2)*4+choose(6,3)*6*choose(3,2)), #k=8
choose(6,3))/choose(9,3)^2 #k=9
choose(9,6))/choose(9,3)^2 #k=9

hence a better qq-plot:

Follow

Get every new post delivered to your Inbox.

Join 671 other followers