## LaTeX issues from Vienna

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on September 21, 2017 by xi'an

When working on the final stage of our edited handbook on mixtures, in Vienna, I came across unexpected practical difficulties! One was that by working on Dropbox with Windows users, files and directories names suddenly switched from upper case to lower cases letters !, making hard-wired paths to figures and subsections void in the numerous LaTeX files used for the book. And forcing us to change to lower cases everywhere. Having not worked under Windows since George Casella gave me my first laptop in the mid 90’s!, I am amazed that this inability to handle both upper and lower names is still an issue. And that Dropbox replicates it. (And that some people see that as a plus.)

The other LaTeX issue that took a while to solve was that we opted for one chapter one bibliography, rather than having a single bibliography at the end of the book, mainly because CRC Press asked for this feature in order to sell chapters individually… This was my first encounter with this issue and I found the solutions to produce individual bibliographies incredibly heavy handed, whether through chapterbib or bibunits, since one has to bibtex one .aux file for each chapter. Even with a one line bash command, this is annoying to the extreme!

## Le Monde puzzle [#1021]

Posted in Books, Kids, R with tags , , , , , on September 17, 2017 by xi'an

A puzzling Le Monde mathematical puzzle for which I could find no answer in the allotted time!:

A most democratic electoral system allows every voter to have at least one representative by having each of the N voters picking exactly m candidates among the M running candidates and setting the size n of the representative council towards this goal, prior to the votes. If there are M=25 candidates, m=10 choices made by the voters, and n=10 representatives, what is the maximal possible value of N? And if N=55,555 and M=33, what is the minimum value of n for which m=n is always possible?

I tried a brute force approach by simulating votes from N voters at random and attempting to find the minimal number of councillors for this vote, which only provides an upper bound of the minimum [for one vote], and a lower bound in the end [over all votes]. Something like

for (i in 1:N) votz[i,]=sample(1:M,n)
#exploration by majority
remz=1:N;conz=NULL
while (length(remz)>0){
seatz=order(-hist(votz[remz,],
breaks=(0:M)+0.5,plot=FALSE)\$density)[1]
conz=c(conz,seatz);nuremz=NULL
for (v in remz)
if (!(seatz%in%votz[v,])) nuremz=c(nuremz,v)
remz=nuremz}
solz=length(conz)
#exploration at random
kandz=matrix(0,N,M)
for (i in 1:N) kandz[i,votz[i,]]=1
for (t in 1:1e3){
#random choice of councillors
zz=sample(c(0,1),M,rep=TRUE)
while (min(kandz%*%zz)!=1)
zz=sample(c(0,1),M,rep=TRUE)
solz=min(solz,sum(zz))
#random choice of remaining councillor per voter
remz=1:N;conz=NULL
while (length(remz)>0){
seatz=sample(votz[remz[1],],1)
conz=c(conz,seatz);nuremz=NULL
for (i in remz)
if (!(seatz%in%votz[i,])) nuremz=c(nuremz,i)
remz=nuremz}
solz=min(solz,length(conz))}
maxz=max(solz,maxz)}


which leads to a value near N=4050 for the first question, with 0% confidence… Obviously, the problem can be rephrased as a binary integer linear programming problem of the form

$n= \max_A \min_{c;\,\min Ac=1}\mathbf{1}^\text{T}c$

where A is the NxM matrix of votes and c is the vector of selected councillors. But I do not see a quick way to fix it!

## Le Chemin [featuring Randal Douc]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on September 17, 2017 by xi'an

My friend and co-author Randal Douc is one of the main actors in the film Le Chemin that came out last week in French cinemas. Taking place in Cambodia and directed by Jeanne Labrune. I have not yet seen the film but will next week as it is scheduled in a nearby cinema (and only six in Paris!)… (Randal was also a main actor in Rithy Panh’s Un barrage contre le Pacifique, as well as the off-voice in the Oscar nominated Rithy Panh’s L’image manquante.) In connection with this new movie, Randal was interviewed in Allociné, the major French website on current movies. With questions about his future film and theatre projects, but none about his on-going maths research!!!

## Le Monde puzzle [#1020]

Posted in Books, Kids, R with tags , , , on September 15, 2017 by xi'an

A collection of liars in this Le Monde mathematical puzzle:

1. A circle of 16 liars and truth-tellers is such that everyone states that their immediate neighbours are both liars. How many liars can there be?
2. A circle of 12 liars and truth-tellers is such that everyone state that their immediate neighbours are one liar plus one truth-teller. How many liars can there be?
3.  A circle of 8 liars and truth-tellers is such that four state that their immediate neighbours are one liar plus one truth-teller and four state that their immediate neighbours are both liars . How many liars can there be?

These questions can easily be solved by brute force simulation. For the first setting, using 1 to code truth-tellers and -1 liars, I simulate acceptable configurations as

tabz=rep(0,16)
tabz[1]=1 #at least one
tabz[2]=tabz[16]=-1
for (i in 3:15){
if (tabz[i-1]==1){
tabz[i]=-1}else{
if (tabz[i+1]==-1){
tabz[i]=1}else{
if (tabz[i+1]==1){
tabz[i]=-1}else{
if (tabz[i-2]==-1){
tabz[i]=1}else{
tabz[i]=sample(c(-1,1),1)
}}}}}


which produces 8, 9, and 10 as possible (and obvious) values.

The second puzzle is associated with the similar R code

tabz=sample(c(-1,1),12,rep=TRUE)
rong=FALSE
while (!rong){
for (i in sample(12)){
if (tabz[i-1+12*(i==1)]*tabz[i%%12+1]==-1){
tabz[i]=1}else{
tabz[i]=sample(c(-1,1),1)}
}
rong=TRUE
for (i in (1:12)[tabz==1])
rong=rong&(tabz[i-1+12*(i==1)]*tabz[i%%12+1]==-1)
if (rong){
for (i in (1:12)[tabz==-1])
rong=rong&(tabz[i-1+12*(i==1)]*tabz[i%%12+1]!=-1)
}}


with numbers of liars (-1) either 12 (obvious) or 4.

The final puzzle is more puzzling in that figuring out the validating function (is an allocation correct?) took me a while, the ride back home plus some. I ended up with the following code that samples liars (-1) and thruth-seekers (1) at random, plus forces wrong and right answers (in 0,1,2) on these, and check for the number of answers of both types:

rong=FALSE
while (!rong){
tabz=sample(c(-1,1),8,rep=TRUE) #truth
tabz[1]=1;tabz[sample(2:8,1)]=-1
tt=(1:8)[tabz==1];lr=(1:8)[tabz==-1]
statz=rep(0,8) #stmt
statz[tt]=(tabz[tt-1+8*(tt==1)]*tabz[tt%%8+1]==-1)+
2*(tabz[tt-1+8*(tt==1)]+tabz[tt%%8+1]==-2)
#answering 0 never works
statz[lr]=2*(tabz[lr-1+8*(lr==1)]*tabz[lr%%8+1]==-1)+
(tabz[lr-1+8*(lr==1)]+tabz[lr%%8+1]==-1)+
sample(c(1,2),8,rep=TRUE)[lr]*
(tabz[lr-1+8*(lr==1)]+tabz[lr%%8+1]==1)
rong=(sum(statz==1)==4)&(sum(statz==2)==4)}


with solutions 3, 4, 5 and 6.

## the “myth of the miracle machine”

Posted in Books, University life with tags , , , , , , , on September 13, 2017 by xi'an

In what appears to be a regular contribution of his to Nature, Daniel Sarewitz recently wrote a “personal take on events” that I find quite reactionary, the more because it comes from an academic. And I wonder why Nature chose to publish his opinion piece. Every other month! The arguments of the author is that basic science should be defunded in favour of “use-inspired” research, “mission oriented” programmes, “societal needs and socially valuable knowledge”… The reason being that it is a better use of public money and that scientists are just another interest group that should not be left to its own device. This is not a new tune, calls to cut down funding fundamental research emerge regularly as an easily found culprit for saving “taxpayer money”, and it is the simplest mean of rejecting a research proposal by blaming its lack of clear applicability. Of course, when looking a bit wider, one can check this piece bemoaning the Democrat inclinations of most scientists. Or that one that science should sometimes give way to religion. With the definitive argument that, for most people, the maths behind scientific models are so complex that they must turn to an act of faith… Yes, I do wonder at Nature providing Sarewitz with such a wide-ranging tribune.

## Texan black swan

Posted in Books, pictures with tags , , , , , , , , on September 12, 2017 by xi'an

“Un événement improbable aux conséquences d’autant plus désastreuses que l’on ne s’y est pas préparé.”

This weekend, there was a short article in Le Monde about the Harvey storm as a Texan illustration of Taleb’s black swan. An analysis that would imply every extreme event like this “once-in-a-thousand year” event (?) can be called a black swan… “An improbable event with catastrophic consequences, the more because it had not been provisioned”, as the above quote translates. Ironically, there is another article in the same journal, about the catastrophe being “ordinary” and “not unexpected”! While such massive floods are indeed impacting a huge number of people and companies, because the storm happened to pour an unusual amount of rain right on top of Houston, they indeed remain within the predictable and not so improbable in terms of the amount of water deposited in the area and in terms of damages, given the amount and style of construction over flood plains. For instance, Houston is less than 50 feet above sea level, has fairly old drainage and pipe systems, and lacks a zoning code. With mostly one or two-story high buildings rather than higher rises. (Incidentally, I appreciated the juxtaposition of the article with the add for Le Monde des Religions and its picture of a devilesque black goat!)

## priors without likelihoods are like sloths without…

Posted in Books, Statistics with tags , , , , , , , , , , , , on September 11, 2017 by xi'an

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

Andrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as a given but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation.  (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a relative meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper Hidden dangers of specifying noninformative priors, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)