Archive for wikipedia

babbage in, babbage out?!

Posted in Books, Kids, Statistics with tags , , , , , , on May 25, 2020 by xi'an

When checking for the origin of “garbage in, garbage out” on Wikipedia, I came upon this citation from Charles Babbage:

“On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”

following earlier quotes from him on this ‘Og.

support Wikipedia

Posted in Books, University life with tags , , on March 15, 2020 by xi'an

psycho-history [Hari Seldon to the rescue!]

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , , , on December 13, 2019 by xi'an

A “long read” article in the Guardian a few weeks ago sounds like Isaac Asimov’s Foundation‘s core concept, namely psychohistory, turning into a real academic discipline! In the books of this fantastic series, the father of this new science of predictive mathematical (or statistical) sociology, Hari Seldon, makes predictions that extend so far in the future that, at every major crisis of Asimov’s galactic empire, he delivers a per-registered message that indicates how to cope with the crisis to save the empire. Or so it seems! (As a teenager, I enjoyed the Foundation books very much, reading the three first volumes several times, to the point I wonder now if they were influential to my choice of a statistics major…! Presumably not, but it makes a nice story!!! Actually, Paul Krugman blames Asimov for his choice of economics as being the closest to psychohistory.)

“I assumed that the time would come when there would be a science in which things could be predicted on a probabilistic or statistical basis (…) can’t help but think it would be good, except that in my stories, I always have opposing views. In other words, people argue all possible… all possible… ways of looking at psychohistory and deciding whether it is good or bad. So you can’t really tell. I happen to feel sort of on the optimistic side. I think if we can somehow get across some of the problems that face us now, humanity has a glorious future, and that if we could use the tenets of psychohistory to guide ourselves we might avoid a great many troubles. But on the other hand, it might create troubles. It’s impossible to tell in advance.” I. Asimov

The Guardian entry is about Peter Turchin, a biologist who had “by the late 1990s answered all the ecological questions that interested him” and then turned his attention to history, creating a new field called cliodynamics. Which bears eerie similarities with Seldon’s psychohistory! Using massive databases of historical events (what is a non-historical event, by the way?!) to predict the future. And relying on a premise of quasi-periodic cycles to fit such predictions with a whiff of Soviet-era theories… I did not read in depth the entire paper (it’s a “long read”, remember?!) and even less the background theory, but I did not spot there a massive support from a large academic community for Turchin’s approach (mentioned in the psychohistory entry in Wikipedia). And, while this is not a major argument from Feyerabend’s perspective (of fundamental scientific advances resulting from breaks from consensus), it seems hard to think of a predictive approach that is not negatively impacted by singularity events, from the emergence of The Mule in Foundation, to the new scale of challenges posed by the acceleration of the climate collapse or the societal globalisation cum communitarian fragmentation caused by social media. And as a last warning, a previous entry in the same column wanted to warn readers “how statistics lost their power and big data controlled by private companies is taking over”, hence going the opposite direction.

what if what???

Posted in Books, Statistics with tags , , , , , on October 7, 2019 by xi'an

[Here is a section of the Wikipedia page on Monte Carlo methods which makes little sense to me. What if it was not part of this page?!]

Monte Carlo simulation versus “what if” scenarios

There are ways of using probabilities that are definitely not Monte Carlo simulations – for example, deterministic modeling using single-point estimates. Each uncertain variable within a model is assigned a “best guess” estimate. Scenarios (such as best, worst, or most likely case) for each input variable are chosen and the results recorded.[55]

By contrast, Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring.[56] For example, a comparison of a spreadsheet cost construction model run using traditional “what if” scenarios, and then running the comparison again with Monte Carlo simulation and triangular probability distributions shows that the Monte Carlo analysis has a narrower range than the “what if” analysis. This is because the “what if” analysis gives equal weight to all scenarios (see quantifying uncertainty in corporate finance), while the Monte Carlo method hardly samples in the very low probability regions. The samples in such regions are called “rare events”.

Le Monde puzzle [#1053]

Posted in Books, Kids, R with tags , , , , , , , on June 21, 2018 by xi'an

An easy arithmetic Le Monde mathematical puzzle again:

  1. If coins come in units of 1, x, and y, what is the optimal value of (x,y) that minimises the number of coins representing an arbitrary price between 1 and 149?
  2.  If the number of units is now four, what is the optimal choice?

The first question is fairly easy to code

coinz <- function(x,y){
  z=(1:149)
  if (y<x){xx=x;x=y;y=xx}
  ny=z%/%y
  nx=(z%%y)%/%x
  no=z-ny*y-nx*x
  return(max(no+nx+ny))
}

and returns M=12 as the maximal number of coins, corresponding to x=4 and y=22. And a price tag of 129.  For the second question, one unit is necessarily 1 (!) and there is just an extra loop to the above, which returns M=8, with other units taking several possible values:

[1] 40 11  3
[1] 41 11  3
[1] 55 15  4
[1] 56 15  4

A quick search revealed that this problem (or a variant) is solved in many places, from stackexchange (for an average—why average?, as it does not make sense when looking at real prices—number of coins, rather than maximal), to a paper by Shalit calling for the 18¢ coin, to Freakonomics, to Wikipedia, although this is about finding the minimum number of coins summing up to a given value, using fixed currency denominations (a knapsack problem). This Wikipedia page made me realise that my solution is not necessarily optimal, as I use the remainders from the larger denominations in my code, while there may be more efficient divisions. For instance, running the following dynamic programming code

coz=function(x,y){
  minco=1:149
  if (x<y){ xx=x;x=y;y=xx}
  for (i in 2:149){
    if (i%%x==0)
      minco[i]=i%/%x
    if (i%%y==0)
      minco[i]=min(minco[i],i%/%y)
    for (j in 1:max(1,trunc((i+1)/2)))
          minco[i]=min(minco[i],minco[j]+minco[i-j])
      }
  return(max(minco))}

returns the lower value of M=11 (with x=7,y=23) in the first case and M=7 in the second one.