## psycho-history [Hari Seldon to the rescue!]

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , , , on December 13, 2019 by xi'an

A “long read” article in the Guardian a few weeks ago sounds like Isaac Asimov’s Foundation‘s core concept, namely psychohistory, turning into a real academic discipline! In the books of this fantastic series, the father of this new science of predictive mathematical (or statistical) sociology, Hari Seldon, makes predictions that extend so far in the future that, at every major crisis of Asimov’s galactic empire, he delivers a per-registered message that indicates how to cope with the crisis to save the empire. Or so it seems! (As a teenager, I enjoyed the Foundation books very much, reading the three first volumes several times, to the point I wonder now if they were influential to my choice of a statistics major…! Presumably not, but it makes a nice story!!! Actually, Paul Krugman blames Asimov for his choice of economics as being the closest to psychohistory.)

“I assumed that the time would come when there would be a science in which things could be predicted on a probabilistic or statistical basis (…) can’t help but think it would be good, except that in my stories, I always have opposing views. In other words, people argue all possible… all possible… ways of looking at psychohistory and deciding whether it is good or bad. So you can’t really tell. I happen to feel sort of on the optimistic side. I think if we can somehow get across some of the problems that face us now, humanity has a glorious future, and that if we could use the tenets of psychohistory to guide ourselves we might avoid a great many troubles. But on the other hand, it might create troubles. It’s impossible to tell in advance.” I. Asimov

The Guardian entry is about Peter Turchin, a biologist who had “by the late 1990s answered all the ecological questions that interested him” and then turned his attention to history, creating a new field called cliodynamics. Which bears eerie similarities with Seldon’s psychohistory! Using massive databases of historical events (what is a non-historical event, by the way?!) to predict the future. And relying on a premise of quasi-periodic cycles to fit such predictions with a whiff of Soviet-era theories… I did not read in depth the entire paper (it’s a “long read”, remember?!) and even less the background theory, but I did not spot there a massive support from a large academic community for Turchin’s approach (mentioned in the psychohistory entry in Wikipedia). And, while this is not a major argument from Feyerabend’s perspective (of fundamental scientific advances resulting from breaks from consensus), it seems hard to think of a predictive approach that is not negatively impacted by singularity events, from the emergence of The Mule in Foundation, to the new scale of challenges posed by the acceleration of the climate collapse or the societal globalisation cum communitarian fragmentation caused by social media. And as a last warning, a previous entry in the same column wanted to warn readers “how statistics lost their power and big data controlled by private companies is taking over”, hence going the opposite direction.

## what if what???

Posted in Books, Statistics with tags , , , , , on October 7, 2019 by xi'an

[Here is a section of the Wikipedia page on Monte Carlo methods which makes little sense to me. What if it was not part of this page?!]

### Monte Carlo simulation versus “what if” scenarios

There are ways of using probabilities that are definitely not Monte Carlo simulations – for example, deterministic modeling using single-point estimates. Each uncertain variable within a model is assigned a “best guess” estimate. Scenarios (such as best, worst, or most likely case) for each input variable are chosen and the results recorded.[55]

By contrast, Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring.[56] For example, a comparison of a spreadsheet cost construction model run using traditional “what if” scenarios, and then running the comparison again with Monte Carlo simulation and triangular probability distributions shows that the Monte Carlo analysis has a narrower range than the “what if” analysis. This is because the “what if” analysis gives equal weight to all scenarios (see quantifying uncertainty in corporate finance), while the Monte Carlo method hardly samples in the very low probability regions. The samples in such regions are called “rare events”.

## Le Monde puzzle [#1053]

Posted in Books, Kids, R with tags , , , , , , , on June 21, 2018 by xi'an

An easy arithmetic Le Monde mathematical puzzle again:

1. If coins come in units of 1, x, and y, what is the optimal value of (x,y) that minimises the number of coins representing an arbitrary price between 1 and 149?
2.  If the number of units is now four, what is the optimal choice?

The first question is fairly easy to code

coinz <- function(x,y){
z=(1:149)
if (y<x){xx=x;x=y;y=xx}
ny=z%/%y
nx=(z%%y)%/%x
no=z-ny*y-nx*x
return(max(no+nx+ny))
}

and returns M=12 as the maximal number of coins, corresponding to x=4 and y=22. And a price tag of 129.  For the second question, one unit is necessarily 1 (!) and there is just an extra loop to the above, which returns M=8, with other units taking several possible values:

[1] 40 11  3
[1] 41 11  3
[1] 55 15  4
[1] 56 15  4


A quick search revealed that this problem (or a variant) is solved in many places, from stackexchange (for an average—why average?, as it does not make sense when looking at real prices—number of coins, rather than maximal), to a paper by Shalit calling for the 18¢ coin, to Freakonomics, to Wikipedia, although this is about finding the minimum number of coins summing up to a given value, using fixed currency denominations (a knapsack problem). This Wikipedia page made me realise that my solution is not necessarily optimal, as I use the remainders from the larger denominations in my code, while there may be more efficient divisions. For instance, running the following dynamic programming code

coz=function(x,y){
minco=1:149
if (x<y){ xx=x;x=y;y=xx}
for (i in 2:149){
if (i%%x==0)
minco[i]=i%/%x
if (i%%y==0)
minco[i]=min(minco[i],i%/%y)
for (j in 1:max(1,trunc((i+1)/2)))
minco[i]=min(minco[i],minco[j]+minco[i-j])
}
return(max(minco))}


returns the lower value of M=11 (with x=7,y=23) in the first case and M=7 in the second one.

## are there a frequentist and a Bayesian likelihoods?

Posted in Statistics with tags , , , , , , , , , , on June 7, 2018 by xi'an

A question that came up on X validated and led me to spot rather poor entries in Wikipedia about both the likelihood function and Bayes’ Theorem. Where unnecessary and confusing distinctions are made between the frequentist and Bayesian versions of these notions. I have already discussed the later (Bayes’ theorem) a fair amount here. The discussion about the likelihood is quite bemusing, in that the likelihood function is the … function of the parameter equal to the density indexed by this parameter at the observed value.

“What we can find from a sample is the likelihood of any particular value of r, if we define the likelihood as a quantity proportional to the probability that, from a population having the particular value of r, a sample having the observed value of r, should be obtained.” R.A. Fisher, On the “probable error’’ of a coefficient of correlation deduced from a small sample. Metron 1, 1921, p.24

By mentioning an informal side to likelihood (rather than to likelihood function), and then stating that the likelihood is not a probability in the frequentist version but a probability in the Bayesian version, the W page makes a complete and unnecessary mess. Whoever is ready to rewrite this introduction is more than welcome! (Which reminded me of an earlier question also on X validated asking why a common reference measure was needed to define a likelihood function.)

This also led me to read a recent paper by Alexander Etz, whom I met at E.J. Wagenmakers‘ lab in Amsterdam a few years ago. Following Fisher, as Jeffreys complained about

“..likelihood, a convenient term introduced by Professor R.A. Fisher, though in his usage it is sometimes multiplied by a constant factor. This is the probability of the observations given the original information and the hypothesis under discussion.” H. Jeffreys, Theory of Probability, 1939, p.28

Alexander defines the likelihood up to a constant, which causes extra-confusion, for free!, as there is no foundational reason to introduce this degree of freedom rather than imposing an exact equality with the density of the data (albeit with an arbitrary choice of dominating measure, never neglect the dominating measure!). The paper also repeats the message that the likelihood is not a probability (density, missing in the paper). And provides intuitions about maximum likelihood, likelihood ratio and Wald tests. But does not venture into a separate definition of the likelihood, being satisfied with the fundamental notion to be plugged into the magical formula

posteriorprior×likelihood

## golden Bayesian!

Posted in Statistics with tags , , , , , , , , , on November 11, 2017 by xi'an

## same data – different models – different answers

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on June 1, 2016 by xi'an

An interesting question from a reader of the Bayesian Choice came out on X validated last week. It was about Laplace’s succession rule, which I found somewhat over-used, but it was nonetheless interesting because the question was about the discrepancy of the “non-informative” answers derived from two models applied to the data: an Hypergeometric distribution in the Bayesian Choice and a Binomial on Wikipedia. The originator of the question had trouble with the difference between those two “non-informative” answers as she or he believed that there was a single non-informative principle that should lead to a unique answer. This does not hold, even when following a reference prior principle like Jeffreys’ invariant rule or Jaynes’ maximum entropy tenets. For instance, the Jeffreys priors associated with a Binomial and a Negative Binomial distributions differ. And even less when considering that  there is no unity in reaching those reference priors. (Not even mentioning the issue of the reference dominating measure for the definition of the entropy.) This led to an informative debate, which is the point of X validated.

On a completely unrelated topic, the survey ship looking for the black boxes of the crashed EgyptAir plane is called the Laplace.

## back from CIRM

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on March 20, 2016 by xi'an

As should be clear from earlier posts, I tremendously enjoyed this past week at CIRM, Marseille, and not only for providing a handy retreat from where I could go running and climbing at least twice a day!  The programme (with slides and films soon to be available on the CIRM website) was very well-designed with mini-courses and talks of appropriate length and frequency. Thanks to Nicolas Chopin (ENSAE ParisTech) and Gilles Celeux  (Inria Paris) for constructing so efficiently this program and to the local organisers Thibaut Le Gouic (Ecole Centrale de Marseille), Denys Pommeret (Aix-Marseille Université), and Thomas Willer (Aix-Marseille Université) for handling the practical side of inviting and accommodating close to a hundred participants on this rather secluded campus. I hope we can reproduce the experiment a few years from now. Maybe in 2018 if we manage to squeeze it between BayesComp 2018 [ex-MCMski] and ISBA 2018 in Edinburgh.

One of the bonuses of staying at CIRM is indeed that it is fairly isolated and far from the fury of down-town Marseille, which may sound like a drag, but actually helps with concentration and interactions. Actually, the whole Aix-Marseille University campus of Luminy on which CIRM is located is surprisingly quiet: we were there in the very middle of the teaching semester and saw very few students around (although even fewer boars!). It is a bit of a mystery that a campus built in such a beautiful location with the Mont Puget as its background and the song of cicadas as the only source of “noise” is not better exploited towards attracting more researchers and students. However remoteness and lack of efficient public transportation may explain a lot about this low occupation of the campus. As may the poor quality of most buildings on the campus, which must be unbearable during the summer months…

In a potential planning for the future Bayesian week at CIRM, I think we could have some sort of poster sessions after-dinner (with maybe a cash bar operated by some of the invited students since there is no bar at CIRM or around). Or trail-running under moonlight, trying to avoid tripping over rummaging boars… A sort of Kaggle challenge would be nice but presumably too hard to organise. As a simpler joint activity, we could collectively contribute to some wikipedia pages related to Bayesian and computational statistics.