**I** came over the weekend across this graph and the associated news that the county of Saint-Nazaire, on the southern border of Brittany, had a significantly higher rate of cancers than the Loire countries. The complete study written by Solenne Delacour, Anne Cowppli-Bony, amd Florence Molinié, is quite cautious about the reasons for this higher rate, even using a Bayesian Poisson-Gamma smoothing (and the R package empbaysmooth), and citing the 1991 paper by Besag, York and Mollié, but the local and national medias are quick to blame the local industries for the difference. The graph above is particularly bad in that it accumulates mortality causes that are not mutually exclusive or independent. For instance, the much higher mortality rate due to alcohol is obviously responsible for higher rates of most other entries. And indicates a sociological pattern that may or may not be due to the type of job in the area, but differs from the more rural other parts of the Loire countries. (Which, like Brittany, are already significantly above (50%) the national reference for alcohol related health issues.), and may not be strongly connected to exposition to chemicals. For instance, the rates of pulmonary cancers are mostly comparable to the national average, if higher than the rest of the Loire countries and connect with a high smoking propensity. Lymphomas are not significantly different from the regional reference. The only type of cancer that can be directly attributed to working conditions are the mesothelioma, mostly caused by asbestos exposure, which was used in ship building, a specialty of the area. Among the many possible reasons for the higher mortality of the county, the study mentions a lower exposure to medical testings (connected with the sociological composition of the area). Which would indicate the most effective policies for lowering these higher cancer and mortality rates.

## Archive for R package

## poor statistics

Posted in Books, pictures, R, Statistics, Travel, Wines with tags alcoholism, Bretagne, Brittany, cancer, drugs, empbaysmooth, epidemiology, Julian Besag, Loire, Loire countries, R, R package, Saint-Nazaire on September 24, 2019 by xi'an## CRAN does not validate R packages!

Posted in pictures, R, University life with tags bayess, Cockatoo Island, CRAN, industrial ruins, mcsm, R, R package, S, sample, Sydney Harbour, Ultimixt on July 10, 2019 by xi'an**A** friend called me the other day for advice on how to submit an R package to CRAN along with a proof his method was mathematically sound. I replied with some items of advice taken from my (limited) experience with submitting packages. And with the remark that CRAN would not validate the mathematical contents of the associated package manual. Nor even the validity of the R code towards delivering the right outcome as stated in the manual. This shocked him quite seriously as he thought having a package accepted by CRAN was a stamp of validation of both the method and the R code. It would be nice of course but would require so much manpower that it seems unrealistic. Some middle ground is to aim at a journal or a peer community validation where both code and methods are vetted. Which happens for instance with the Journal of Computational and Graphical Statistics. Or the Journal of Statistical Software (which should revise its instructions to authors that states “The majority of software published in JSS is written in S, MATLAB, SAS/IML, C++, or Java”. S, really?!)

As for the validity of the latest release of R (currently R-3.6.1 which came out on 2019-07-05, named Action of the Toes!), I figure the bazillion R programs currently running should be able to detect any defect pretty fast, although awareness of the incredible failure of sample() reported in an earlier post took a while to appear.

## EntropyMCMC [R package]

Posted in Statistics with tags convergence assessment, CRAN, discretization, entropy, EntropyMCMC, Lecture Notes in Statistics, MCMC, MCMC convergence, Monte Carlo Statistical Methods, R package, Springer-Verlag, Université d'Orléans, untractable normalizing constant on March 26, 2019 by xi'an**M**y colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. *[Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]*

## Imperial postdoc in Bayesian nonparametrics

Posted in pictures, R with tags Bayesian non-parametrics, independence, London, machine learning, mathematics, postdoctoral position, R, R package, United Kingdom on April 27, 2018 by xi'an**H**ere is another announcement for a post-doctoral position in London (UK) to work with Sarah Filippi. In the Department of Mathematics at Imperial College London. (More details on the site or in this document. Hopefully, the salary is sufficient for staying in London, if not in South Kensington!)

The post holder will work on developing a novel Bayesian Non-Parametric Test for Conditional Independence. This is at the core of modern causal discovery, itself of paramount importance throughout the sciences and in Machine Learning. As part of this project, the post holder will derive a Bayesian non-parametric testing procedure for conditional independence, scalable to high-dimensional conditioning variable. To ensure maximum impact and allow experimenters in different fields to easily apply this new methodology, the post holder will then create an open-source software package available on the R statistical programming platform. Doing so, the post holder will investigate applying this approach to real-world data from our established partners who have a track record of informing national and international bodies such as Public Health England and the World Health Organisation.

## Le Monde puzzle [#1048]

Posted in Books, Kids, R with tags Le Monde, mathematical puzzle, prime numbers, primes, R, R package on April 1, 2018 by xi'an**A**n arithmetic Le Monde mathematical puzzle:

A magical integer m is such that the remainder of the division of any prime number p by m is either a prime number or 1. What is the unique magical integer between 25 and 100? And is there any less than 25?

The question is dead easy to code

primz=c(1,generate_primes(2,1e6)) for (y in 25:10000) if (min((primz[primz>y]%%y)%in%primz)==1) print(y)

and return m=30 as the only solution. Bon sang but of course!, since 30=2x3x5… (Actually, the result follows by dividing the quotient of the division of a prime number by 2 by 3 and then the resulting quotient by 5: all possible cases produce a remainder that is a prime number.) For the second question, the same code returns 2,3,4,6,8,12,18,24 as further solutions. There is no solution beyond 30.

## weakly informative reparameterisations

Posted in Books, pictures, R, Statistics, University life with tags Bayesian modelling, Edinburgh, Gaussian mixture, JCGS, location-scale parameterisation, moments, non-informative priors, publication, R package, Ultimixt on February 14, 2018 by xi'an**O**ur paper, weakly informative reparameterisations of location-scale mixtures, with Kaniav Kamary and Kate Lee, got accepted by JCGS! Great news, which comes in perfect timing for Kaniav as she is currently applying for positions. The paper proposes a unidimensional mixture Bayesian modelling based on the first and second moment constraints, since these turn the remainder of the parameter space into a compact. While we had already developed an associated R package, Ultimixt, the current editorial policy of JCGS imposes the R code used to produce all results to be attached to the submission and it took us a few more weeks than it should have to produce a directly executable code, due to internal library incompatibilities. (For this entry, I was looking for a link to our special JCGS issue with my picture of Edinburgh but realised I did not have this picture.)

## bridgesampling [R package]

Posted in pictures, R, Statistics, University life with tags Amsterdam, bridge, bridge sampling, bridgesampling, JAGS, R, R package, STAN, University of Amsterdam, warped bridge sampling on November 9, 2017 by xi'an**Q**uentin F. Gronau, Henrik Singmann and Eric-Jan Wagenmakers have arXived a detailed documentation about their * bridgesampling* R package. (No wonder that researchers from Amsterdam favour bridge sampling!)

*[The package relates to a [52 pages] tutorial on bridge sampling by Gronau et al. that I will hopefully comment soon.]*The bridge sampling methodology for marginal likelihood approximation requires

*two*Monte Carlo samples for a ratio of

*two*integrals. A nice twist in this approach is to use a dummy integral that is already available, with respect to a probability density that is an approximation to the exact posterior. This means avoiding the difficulties with bridge sampling of bridging two different parameter spaces, in possibly different dimensions, with potentially very little overlap between the posterior distributions. The substitute probability density is chosen as Normal or warped Normal, rather than a t which would provide more stability in my opinion. The

*package also provides an error evaluation for the approximation, although based on spectral estimates derived from the*

**bridgesampling****package. The remainder of the document exhibits how the package can be used in conjunction with either JAGS or Stan. And concludes with the following words of caution:**

*coda*

“It should also be kept in mind that there may be cases in which the bridge sampling procedure may not be the ideal choice for conducting Bayesian model comparisons. For instance, when the models are nested it might be faster and easier to use the Savage-Dickey density ratio (Dickey and Lientz 1970; Wagenmakers et al. 2010). Another example is when the comparison of interest concerns a very large model space, and a separate bridge sampling based computation of marginal likelihoods may take too much time. In this scenario, Reversible Jump MCMC (Green 1995) may be more appropriate.”