Archive for Salzburg

Korean trip

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 24, 2019 by xi'an

A fairly short but exciting trip to Seoul and to the Fall meeting of the Korean Statistical Society there. Plus giving a seminar at Seoul National University, where I stayed and enjoyed its beautiful campus surrounded by hills painted in the flamboyant reds and yellows of trees. Running to the top of Gwanaksan in the early morning, with some scrambling moments, was a fantastic beginning for the day! Although it was quite unintentional Sacha Tsybakov from CREST happened to be another invited speaker at the meeting (along with Regina Liu from Rutgers, whom I was also met in Salzburg two months ago) and we had a nice stroll together on the University of Seoul campus during a break in the sessions, gaining another view of the city from the top of the Bukhasan mountain. The talk I gave there on the asymptotics of ABC happened to be more attended than my tutorial lecture delivered at the beginning of JSM in Denver this summer. I am thus quite grateful to the organisers for their invitation and this opportunity to meet Korean statisticians and to get a glimpse of Korean culture and cuisine!

 

Prussian blue [book review]

Posted in Books, Travel with tags , , , , , , , , , , , , , , , , on September 28, 2019 by xi'an

This is the one-before-last volume in Philip Kerr’s Bernie Gunther series (one-before-last since the author passed away last year). Which I picked in a local bookstore for taking place in Berchtesgaden, which stands a few kilometers west of Salzburg and which I passed on my way there (and back) last week. Very good title, full of double meanings!

“When you’re working for people who are mostly thieves and murderers, a little of it comes off on your hands now and then.”

Two time-lines run in parallel in Prussian Blue, from 1939 Nazi Germany to 1956 France, from (mostly) hunter to hunted. Plenty of wisecracks worth quoting throughout the book, mostly à la Marlowe, but also singling out Berlin(ers) from the rest of Germany. An anti-hero if any in that Bernie Gunther is working there as a policeman for the Nazi State, aiming at making the law respected in a lawless era and to catch murderers at a time where the highest were all murderers and about to upscale this qualification to levels never envisioned before. Still working under Heydrich’s order to solve a murder despite the attempt of other arch-evils like Martin Bormann and Ernst Kaltenbrunner, as well as a helpful (if Hitler supporter!) Gerdy Troost. Among the Gunther novels I have read so far this one is the closest he gets to the ultimate evil, Hitler himself, who considered the Berghof in Berchtesgaden as his favourite place, without ever meeting him. The gratuitous violence and bottomless corruption inherent to the fascist regime are most realistically rendered in the thriller, to the point of making the possibility of a Bernie Gunther debatable!

‘Making a nuisance of yourself is what being a policeman is all about and suspecting people who were completely above suspicion was about the only thing that made doing the job such fun in Nazi Germany.’

As I kept reading the book I could not but draw a connection with the pre-War Rogue Male imperfect but nonetheless impressive novel, where an English “sport” hunter travels to Berchtesgaden to shoot (or aim at) Hitler only to get spotted by soldiers before committing the act and becoming hunted in his turn throughout Europe, ending up [spoiler!] in a burrow trapped by Nazi secret services [well this is not exactly the end!]. This connection has been pointed out in some reviews, but the role of the burrows and oppressive underground and the complicity of the local police forces are strongly present in both books and somewhat decreases the appeal of this novel. Especially since the 1956 thread therein is a much less convincing plot than the 1939 one, despite involving conveniently forgotten old colleagues, the East Germany Stasi, hopeless French policemen and clergymen, the Sarre referendum, [much maligned!] andouillettes and oignons.

email footprint

Posted in Travel, University life with tags , , , , , on September 14, 2019 by xi'an

While I was wondering (im Salzburg) at the carbon impact of sending emails with an endless cascade of the past history of exchanges and replies, I found this (rather rudimentary) assessment  that, while standard emails had an average impact of 4g, those with long attachments could cost 50g, quoting from Burners-Lee, leading to the fairly astounding figure of an evaluated impact of 1.6 kg a day or more than half a ton per year! Quite amazing when considering that a round flight Paris-Birmingham is producing 80kg. Hence justifying a posteriori my habit of removing earlier emails when replying to them. (It takes little effort to do so, especially in mailers where this feature can be set as the default option.)

 

9 pitfalls of data science [book review]

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 11, 2019 by xi'an

I received The 9 pitfalls of data science by Gary Smith [who has written a significant number of general public books on personal investment, statistics and AIs] and Jay Cordes from OUP for review a few weeks ago and read it on my trip to Salzburg. This short book contains a lot of anecdotes and what I would qualify of small talk on job experiences and colleagues’ idiosyncrasies…. More fundamentally, it reads as a sequence of examples of bad or misused statistics, as many general public books on statistics do, but with little to say on how to spot such misuses of statistics. Its title (It seems like the 9 pitfalls of… is a rather common début for a book title!) however started a (short) conversation with my neighbour on the train to Salzburg as she wanted to know if the job opportunities in data sciences were better in Germany than in Austria. A practically important question for which I had no clue. And I do not think the book would have helped either! (My neighbour in the earlier plane to München had a book on growing lotus, which was not particularly enticing for launching a conversation either.)

Chapter I “Using bad data” is made of examples of truncated or cherry picked data often associated with poor graphics. Only one dimensional outcome and also very US centric. Chapter II “Data before theory” highlights spurious correlations and post hoc predictions, criticism of data mining, some examples being quite standard. Chapter III “Worshiping maths” sounds like the perfect opposite of the previous cahpter: it discusses the fact that all models are wrong but some may be more wrong than others. And gives examples of over fitting, p-value hacking, regression applied to longitudinal data. With the message that (maths) assumptions are handy and helpful but not always realistic. Chapter IV “Worshiping computers” is about the new golden calf and contains rather standard stuff on trusting the computer output because it is a machine. However, the book is somewhat falling foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees since Monte Carlo also depends on a model! Computer simulations may be fine for Bingo night or poker tournaments but much more uncertain for complex decisions like retirement investments. It is also missing the biasing aspects in constructing recidivism prediction models pointed out in Weapons of math destruction. Until Chapter 9 at least. The chapter is also mentioning adversarial attacks if not GANs (!). Chapter V “Torturing data” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers and contains more about p-hacking and reproducibility. Chapter VI “Fooling yourself” is a rather weak chapter in my opinion. Apart from Ioannidis take on Theranos’ lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker, with boasts of significant gains that are possibly earned from compulsive gamblers playing their family savings, which is not particularly praiseworthy. And about Brazilian jiu-jitsu. Chapter VII “Correlation vs causation” predictably mentions Judea Pearl (whose book of why I just could not finish after reading one rant too many about statisticians being unable to get causality right! Especially after discussing the book with Andrew.). But not so much to gather from the chapter, which could have instead delved into deep learning and its ways to avoid overfitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around. Chapter VII “Regression to the mean” sees Galton’s quincunx reappearing here after Pearl’s book where I learned (and checked with Steve Stiegler) that the device was indeed intended for that purpose of illustrating regression to the mean. While the attractive fallacy is worth pointing out there are much worse abuses of regression that could be presented. CHANCE’s Howard Wainer also makes an appearance along SAT scores. Chapter IX “Doing harm” does engage into the issue that predicting social features like recidivism by a (black box) software is highly worrying (and just plain wrong) if only because of this black box nature. Moving predictably to chess and go with the right comment that this does not say much about real data problems. A word of warning about DNA testing containing very little about ancestry, if only because of the company limited and biased database. With further calls for data privacy and a rather useless entry on North Korea. Chapter X “The Great Recession“, which discusses the subprime scandal (as in Stewart’s book), contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or impress the reader?!) leading to the rather obvious result that the expected concave utility of a weighted average of iid positive rvs is maximal when all the weights are equal, result that is criticised by laughing at the assumption of iid-ness in the case of mortgages. Along with those who bought exotic derivatives whose construction they could not understand. The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls. Except the second about data before theory, because there was no data, only theory with no connection with reality. This final chapter is rather enjoyable, if coming after the facts. And containing this altogether unnecessary mathematical entry. [Usual warning: this review or a revised version of it is likely to appear in CHANCE, in my book reviews column.]

dodging bullets, IEDs, and fingerprint detection at SimStat19

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , on September 10, 2019 by xi'an

I attended a fairly interesting forensic science session at SimStat 2019 in Salzburg as it concentrated on evidence and measures of evidence rather than on strict applications of Bayesian methodology to forensic problems. Even though American administrations like the FBI or various police departments were involved. It was a highly coherent session and I had a pleasant discussion with some of the speakers after the session. For instance, my friend Alicia Carriquiry presented an approach to determined from images of bullets whether or not they have been fired from the same gun, leading to an interesting case for a point null hypothesis where the point null makes complete sense. The work has been published in Annals of Applied Statistics and is used in practice. The second talk by Danica Ommen on fiducial forensics on IED, asking whether or not copper wires used in the bombs are the same, which is another point null illustration. Which also set an interesting questioning on the dependence of the alternative prior on the distribution of material chosen as it is supposed to cover all possible origins for the disputed item. But more interestingly this talk launched into a discussion of making decision based on finite samplers and unknown parameters, not that specific to forensics, with a definitely surprising representation of the Bayes factor as an expected likelihood ratio which made me first reminiscent of Aitkin’s (1991) infamous posterior likelihood (!) before it dawned on me this was a form of bridge sampling identity where the likelihood ratio only involved parameters common to both models, making it an expression well-defined under both models. This identity could be generalised to the general case by considering a ratio of integrated likelihoods, the extreme case being the ratio equal to the Bayes factor itself. The following two talks by Larry Tang and Christopher Saunders were also focused on the likelihood ratio and their statistical estimates, debating the coherence of using a score function and presenting a functional ABC algorithm where the prior is a Dirichlet (functional) prior. Thus a definitely relevant session from a Bayesian perspective!

 

Salzburg castle [jatp]

Posted in Mountains, pictures, Travel with tags , , , , , , on September 9, 2019 by xi'an

likelihood-free inference by ratio estimation

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on September 9, 2019 by xi'an

“This approach for posterior estimation with generative models mirrors the approach of Gutmann and Hyvärinen (2012) for the estimation of unnormalised models. The main difference is that here we classify between two simulated data sets while Gutmann and Hyvärinen (2012) classified between the observed data and simulated reference data.”

A 2018 arXiv posting by Owen Thomas et al. (including my colleague at Warwick, Rito Dutta, CoI warning!) about estimating the likelihood (and the posterior) when it is intractable. Likelihood-free but not ABC, since the ratio likelihood to marginal is estimated in a non- or semi-parametric (and biased) way. Following Geyer’s 1994 fabulous estimate of an unknown normalising constant via logistic regression, the current paper which I read in preparation for my discussion in the ABC optimal design in Salzburg uses probabilistic classification and an exponential family representation of the ratio. Opposing data from the density and data from the marginal, assuming both can be readily produced. The logistic regression minimizing the asymptotic classification error is the logistic transform of the log-ratio. For a finite (double) sample, this minimization thus leads to an empirical version of the ratio. Or to a smooth version if the log-ratio is represented as a convex combination of summary statistics, turning the approximation into an exponential family,  which is a clever way to buckle the buckle towards ABC notions. And synthetic likelihood. Although with a difference in estimating the exponential family parameters β(θ) by minimizing the classification error, parameters that are indeed conditional on the parameter θ. Actually the paper introduces a further penalisation or regularisation term on those parameters β(θ), which could have been processed by Bayesian Lasso instead. This step is essentially dirving the selection of the summaries, except that it is for each value of the parameter θ, at the expense of a X-validation step. This is quite an original approach, as far as I can tell, but I wonder at the link with more standard density estimation methods, in particular in terms of the precision of the resulting estimate (and the speed of convergence with the sample size, if convergence there is).