Filed under: Books, pictures, R, Statistics, University life Tagged: ABC model choice, abcrf, Bayesian model choice, DIYABC, France, model posterior probabilities, PNAS, R, random forests, UFOs ]]>

*“…likelihood inference is in a **fundamental way more complicated than the classical method of moments.”*

**C**arlos Amendola, Mathias Drton, and Bernd Sturmfels arXived a paper this Friday on “maximum likelihood estimates for Gaussian mixtures are transcendental”. By which they mean that trying to solve the five likelihood equations for a two-component Gaussian mixture does not lead to an algebraic function of the data. (When excluding the trivial global maxima spiking at any observation.) This is not highly surprising when considering two observations, 0 and x, from a mixture of N(0,1/2) and N(μ,1/2) because the likelihood equation

involves both exponential and algebraic terms. While this is not directly impacting (statistical) inference, this result has the computational consequence that the number of critical points ‘and also the maximum number of local maxima, depends on the sample size and increases beyond any bound’, which means that EM faces increasing difficulties in finding a global finite maximum as the sample size increases…

Filed under: Books, R, Statistics Tagged: algebraic geometry, computational statistics, EM, mixtures of distributions, transcendental equations ]]>

First, the place was surprisingly quiet, both in terms of traffic and in the interactions between people. No one burst in screaming for help or collapsed before reaching the front desk! Maybe because this was an afternoon of a weekday rather than Saturday night, maybe because emergency services like firemen had their separate entry. Since this was the walk-in entry, the dozen or so people who visited the ward that afternoon walked in, waited in line and were fairly quickly seen by a nurse or a physician to decide on a course of action. Most of them did not come back to the entry room. While I saw a few others leave by taxi or with relatives. The most dramatic entry was a man leaning heavily on his wife, who seemed to have had a fall while playing polo (!) and who recovered rather fast (but not fast enough to argue with his wife about giving up polo!). Similarly, the interactions with the administrative desk were devoid of the usual tension when dealing with French bureaucrats, who often seem eager to invent new paperwork to delay action: the staff was invariably helpful, even with patients missing documents, and the only incident was with a taxi driver refusing to take an elderly patient home because of a missing certificate no other taxi seemed to require.

Second, and again this was surprising for me, I did not see many instances of people coming to the emergency department to bypass waiting or paying for a doctor, even though some were asked why they had not seen a doctor before (not much intimacy at the entry desk…). One old man with a missing leg spent some time in the room discussing with hospital social workers about where to spend the night but, as the homeless shelters around were all full, they ended up advising him to find a protected spot for the night, while agreeing to keep his bags for a day. It was raining rather heavily and the man was just out of cardiology so I found the advice a bit harsh. However, he was apparently a regular and I saw him later sitting in his wheelchair under an awning in a nearby street, begging from passer-bys.

The most exciting event of the afternoon (apart from the good news that there was no deep venous thrombosis, of course!) was the expulsion of a young woman who had arrived on a kick-scooter one hour earlier, not gone to the registration desk, and was repeatedly drinking coffees and eating snacks from the vending machine while exiting now and then to smoke a cigarette and while bothering with the phone chargers in the room. A security guard arrived and told her to leave, which she did, somewhat grudgingly. For the whole time, I could not fathom what was the point of her actions, but being the Jon Snow of emergency wards, what do I know?!

Filed under: Travel Tagged: deep vein thrombosis, emergency department, French hospital, Paris ]]>

*“…for a general linear model (GLM), a single linear function is a sufficient statistic for each associated parameter…”*

The recently arXived paper “Likelihood-free inference in high-dimensional models“, by Kousathanas et al. (July 2015), proposes an ABC resolution of the dimensionality curse [when the dimension of the parameter and of the corresponding summary statistics] by turning Gibbs-like and by using a component-by-component ABC-MCMC update that allows for low dimensional statistics. In the (rare) event there exists a conditional sufficient statistic for each component of the parameter vector, the approach is just as justified as when using a generic ABC-Gibbs method based on the whole data. Otherwise, that is, when using a non-sufficient estimator of the corresponding component (as, e.g., in a generalised [not general!] linear model), the approach is less coherent as there is no joint target associated with the Gibbs moves. One may therefore wonder at the convergence properties of the resulting algorithm. The only safe case [in dimension 2] is when one of the restricted conditionals does not depend on the other parameter. Note also that each Gibbs step a priori requires the simulation of a new pseudo-dataset, which may be a major imposition on computing time. And that setting the tolerance for each parameter is a delicate calibration issue because in principle the tolerance should depend on the other component values. I ran a comparative experiment on a toy normal target, using either empirical mean and variance (blue), or empirical median and mad (brown), with little difference in the (above) output. Especially when considering that I set the tolerance somewhat arbitrarily. This could be due to the fact that the pairs are quite similar in terms of their estimation properties. However, I then realised that the empirical variance is not sufficient for the variance conditional on the mean parameter. I looked at the limiting case (with zero tolerance), which amounts to simulating σ first and then μ given σ, and ran a (Gibbs and iid) simulation. The difference, as displayed below (red standing for the exact ABC case), is not enormous, even though it produces a fatter tail in μ. Note the interesting feature that I cannot produce the posterior distribution of the parameters given the median and mad statistics. Which is a great introduction to ABC!

N=10 data=rnorm(N,mean=3,sd=.5) #ABC with insufficient statistics medata=median(data) madata=mad(data) varamad=rep(0,100) for (i in 1:100) varamad[i]=mad(sample(data,N,rep=TRUE)) tol=c(.01*mad(data),.05*mad(varamad)) T=1e6 mu=rep(median(data),T) sig=rep(mad(data),T) for (t in 2:T){ mu[t]=rnorm(1) psudo=rnorm(N,mean=mu[t],sd=sig[t-1]) if (abs(medata-median(psudo))>tol[1]) mu[t]=mu[t-1] sig[t]=1/rexp(1) psudo=rnorm(N,mean=mu[t],sd=sig[t]) if (abs(madata-mad(psudo))>tol[2]) sig[t]=sig[t-1] } #ABC with more sufficient statistics meaata=mean(data) sddata=sd(data) varamad=rep(0,100) for (i in 1:100) varamad[i]=sd(sample(data,N,rep=TRUE)) tol=c(.1*sd(data),.1*sd(varamad)) for (t in 2:T){ mu[t]=rnorm(1) psudo=rnorm(N,mean=mu[t],sd=sig[t-1]) if (abs(meaata-mean(psudo))>tol[1]) mu[t]=mu[t-1] sig[t]=1/rexp(1) psudo=rnorm(N,mean=mu[t],sd=sig[t]) if (abs(sddata-sd(psudo))>tol[2]) sig[t]=sig[t-1] } #MCMC with false sufficient sig=1/sqrt(rgamma(T,shape=.5*N,rate=1+.5*var(data))) mu=rnorm(T,mean(data)/(1+sig^2/N),sd=1/sqrt(1+N/sig^2)))

Filed under: Books, R, Statistics, University life Tagged: ABC, ABC-Gibbs, compatible conditional distributions, convergence of Gibbs samplers, curse of dimensionality, exact ABC, Gibbs sampling, median, median absolute deviation, R ]]>

The paper starts with a fairly theoretical part, to follow with an application to austerity sampling *[and, in the earlier version of the paper, to the Hoeffding bounds of Bardenet et al., both discussed earlier on the ‘Og, to exponential random graphs (the paper being rather terse on the description of the subsampling mechanism), to stochastic gradient Langevin dynamics (by Max Welling and Yee-Whye Teh), and to ABC-MCMC]*. The assumptions are about the transition kernels of a reference Markov kernel and of one associated with the approximation, imposing some bounds on the Wasserstein distance between those kernels, K and K’. Results being generic, there is no constraint as to how K is chosen or on how K’ is derived from K. Except in Lemma 3.6 and in the application section, where the same proposal kernel L is used for both Metropolis-Hastings algorithms K and K’. While I understand this makes for an easier coupling of the kernels, this also sounds like a restriction to me in that modifying the target begs for a similar modification in the proposal, if only because the tails they are a-changin’…

**I**n the case of subsampling the likelihood to gain computation time (as discussed by Korattikara et al. and by Bardenet et al.), the austerity algorithm as described in Algorithm 2 is surprising as the average of the sampled data log-densities and the log-transform of the remainder of the Metropolis-Hastings probability, which seem unrelated, are compared until they are close enough. I also find hard to derive from the different approximation theorems bounding exceedance probabilities a rule to decide on the subsampling rate as a function of the overall sample size and of the computing cost. (As a side if general remark, I remain somewhat reserved about the subsampling idea, given that it requires the entire dataset to be available at every iteration. This makes parallel implementations rather difficult to contemplate.)

Filed under: pictures, Statistics, Travel, University life Tagged: ABC-MCMC, accelerated ABC, Approximate Bayesian computation, austerity sampling, ergodicity, MCMC, Metropolis-Hastings algorithms, Monte Carlo Statistical Methods, Natesh Pillai, subsampling, Wasserstein distance ]]>

Filed under: Mountains, pictures, Travel Tagged: blogging, Maple Pass loop, North Cascades National Park, Og, R, Rainy Pass, Washington State ]]>

The hike was really great, staying on a high ridge for most of the time and offering 360⁰ views of the Eastern North Cascades (as well as forest fire smoke clouds in the distance…) Walking on the trail was very smooth as it was wide enough, with a limited gradient and hardly anyone around. Actually, we felt like intruding tourists on the trail, with our light backpacks, since the few hikers we crossed were long-distance hikers, “doing” the trail with sometimes backpacks that looked as heavy as Strayed’s original “Monster”. And sometimes with incredibly light ones. A great specificity of those people is that they all were more than ready to share their experiences and goals, with no complaint about the hardship of being on the trail for several months! And sounding more sorry than eager to reach the Canadian border and the end of the PCT in a few more dozen miles… For instance, a solitary female hiker told us of her plans to get back to the section near Lake Chelan she had missed the week before due to threatening forest fires. A great entry to the PCT, with the dream of walking a larger portion in an undefined future…

Filed under: Books, Kids, Mountains, pictures, Running, Travel Tagged: backpacking, North Cascades National Park, Oregon, Pacific crest trail, PCT, vacations, Washington State ]]>

*“At the level of discourse, we would like to move beyond a subjective vs. objective shouting match.” (p.30)*

**T**his paper by Andrew Gelman and Christian Hennig calls for the abandonment of the terms *objective* and *subjective* in (not solely Bayesian) statistics. And argue that there is more than mere prior information and data to the construction of a statistical analysis. The paper is articulated as the authors’ proposal, followed by four application examples, then a survey of the philosophy of science perspectives on objectivity and subjectivity in statistics and other sciences, next to a study of the subjective and objective aspects of the mainstream statistical streams, concluding with a discussion on the implementation of the proposed move.

“…scientists and the general public celebrate the brilliance and inspiration of greats such as Einstein, Darwin, and the like, recognizing the roles of their personalities and individual experiences in shaping their theories and discoveries” (p.2)

I do not see the relevance of this argument, in that the myriad of factors leading, say, Marie Curie or Rosalind Franklin to their discoveries are more than subjective, as eminently personal and the result of unique circumstance, but the corresponding theories remain within a common and therefore objective corpus of scientific theories. Hence I would not equate the derivation of statistical estimators or even less the computation of statistical estimates to the extension or negation of existing scientific theories by scientists.

“We acknowledge that the “real world” is only accessible to human beings through observation, and that scientific observation and measurement cannot be independent of human preconceptions and theories.” (p.4)

The above quote reminds me very much of Poincaré‘s

“It is often said that experiments should be made without preconceived ideas. That is impossible. Not only would it make every experiment fruitless, but even if we wished to do so, it could not be done. Every man has his own conception of the world, and this he cannot so easily lay aside.” Henri Poincaré, La Science et l’Hypothèse

The central proposal of the paper is to replace `objective’ and `subjective’ with less value-loaded and more descriptive terms. Given that very few categories of statisticians take pride in their subjectivity, apart from a majority of Bayesians, but rather use the term as derogatory for other categories, I fear the proposal stands little chance to see this situation resolved. Even though I agree we should move beyond this distinction that does not reflect the complexity and richness of statistical practice. As the discussion in Section 2 makes it clear, all procedures involve subjective choices and calibration (or tuning), either plainly acknowledged or hidden under the carpet. Which is why I would add (at least) two points to the virtues of subjectivity:

- Spelling out unverifiable assumptions about the data production;
- Awareness of calibration of tuning parameters.

while I do not see *consensus* as necessarily a virtue. The following examples in Section 3 are all worth considering as they bring more details, albeit in specific contexts, to the authors’ arguments. Most of them give the impression that the major issue stands with the statistical model itself, which may be both the most acute subjectivity entry in statistical analyses and the least discussed one. Including the current paper, where e.g. Section 3.4 wants us to believe that running a classical significance test is objective and apt to detect an unfit model. And the hasty dismissal of machine learning in Section 6 is disappointing, because one thing machine learning does well is to avoid leaning too much on the model, using predictive performances instead. Furthermore, apart from Section 5.3, I actually see little in the paper about the trial-and-error way of building a statistical model and/or analysis, while subjective inputs from the operator are found at all stages of this construction and should be spelled out rather than ignored (and rejected).

“Yes, Bayesian analysis can be expressed in terms of subjective beliefs, but it can also be applied to other settings that have nothing to do with beliefs.” (p.31)

The survey in Section 4 about what philosophy of sciences says about objectivity and subjectivity is quite thorough, as far as I can judge, but does not expand enough the issue of “default” or all-inclusive statistical solutions, used through “point-and-shoot” software by innumerate practitioners in mostly inappropriate settings, with the impression of conducting “the” statistical analysis. This false feeling of “the” proper statistical analysis and its relevance for this debate also transpire through the treatment of statistical expertises by media and courts. I also think we could avoid the Heisenberg principle to be mentioned in this debate, as it does not really contribute anything useful. More globally, the exposition of a large range of notions of objectivity is as often the case in philosophy not conclusive and I feel nothing substantial c28omes out of it… And that it is somehow antagonistic with the notion of a discussion paper, since every possible path has already been explored. Even forking ones. As a non-expert in philosophy, I would not feel confident in embarking upon a discussion on what realism is and is not.

“the subjectivist Bayesian point of view (…) can be defended for honestly acknowledging that prior information often does not come in ways that allow a unique formalization” (p.25)

When going through the examination of the objectivity of the major streams of statistical analysis, I get the feeling of exploring small worlds (in Lindley‘s words) rather than the entire spectrum of statistical methodologies. For instance, frequentism seems to be reduced to asymptotics, while completely missing the entire (lost?) continent of non-parametrics. (Which should not be considered to be “more” objective, but has the advantage of loosening the model specification.) While the error-statistical (frequentist) proposal of Mayo (1996) seems to consume a significant portion *[longer than the one associated with the objectivist Bayesianism section]* of the discussion with respect to its quite limited diffusion within statistical circles. From a Bayesian perspective, the discussions of subjective, objective, and falsificationist Bayes do not really bring a fresh perspective to the debate between those three branches, apart from suggesting we should give up such value loaded categorisations. As an O-Bayes card-carrying member, I find the characterisation of the objectivist branch somehow restrictive, by focussing solely on Jaynes‘ maxent solution. Hence missing the corpus of work on creating priors with guaranteed frequentist or asymptotic properties. Like matching priors. I also find the defence of the falsificationist perspective, i.e. of Gelman and Shalizi (2013) both much less critical and quite extensive, in that, again, this is not what one could call a standard approach to statistics. Resulting in an implicit (?) message that this may the best way to proceed.

In conclusion, on the *positive* side *[for there is a positive side!]*, the paper exposes the need to spell out the various inputs (from the operator) leading to a statistical analysis, both for replicability or reproducibility, and for “objectivity” purposes, although solely conscious choices and biases can be uncovered this way. It also reinforces the call for model awareness, by which I mean a critical stance on all modelling inputs, including priors!, a disbelief that any model is true, applying to statistical procedures Popper’s critical rationalism. This has major consequences on Bayesian modelling in that, as advocated in Gelman and Shalizi (2013) , as well as Evans (2015), sampling and prior models should be given the opportunity to be updated when they are inappropriate for the data at hand. On the *negative* side, I fear the proposal is far too idealistic in that most users (and some makers) of statistics cannot spell out their assumptions and choices, being unaware of those. This is in a way *[admitedly, with gross exaggeration!]* the central difficulty with statistics that almost anyone anywhere can produce an estimate or a p-value without ever being proven wrong. It is therefore difficult to perceive how the epistemological argument therein [that objective versus subjective is a meaningless opposition] is going to profit statistical methodology, even assuming the list of Section 2.3 was to be made compulsory. The eight deadly sins listed in the final section would require expert reviewers to vanish from publication (and by expert, I mean expert in statistical methodology), while it is almost never the case that journals outside our field make a call to statistics experts when refereeing a paper. Apart from banning all statistics arguments from a journal, I am afraid there is no hope for a major improvement in that corner…

All in all, the authors deserve a big thank for making me reflect upon those issues and (esp.) back their recommendation for reproducibility, meaning not only the production of all conscious choices made in the construction process, but also through the posting of (true or pseudo-) data and of relevant code for all publications involving a statistical analysis.

Filed under: Books, Statistics, University life Tagged: academic journals, Basic and Applied Social Psychology, Dennis Lindley, Error-Statistical philosophy, falsification, frequentist inference, Henri Poincaré, Karl Popper, Marie Curie, objective Bayes, p-values, refereeing, reproducible research, subjective versus objective Bayes ]]>

The package abcrf consists of three functions:

*abcrf*, which constructs a random forest from a reference table and returns an object of class `abc-rf’;*plot.abcrf*, which gives both variable importance plot of a model choice abc-rf object and the projection of the reference table on the LDA axes;*predict.abcrf*, which predict the model for new data and evaluate the posterior probability of the MAP.

An illustration from the manual:

data(snp) data(snp.obs) mc.rf <- abcrf(snp[1:1e3, 1], snp[1:1e3, -1]) predict(mc.rf, snp[1:1e3, -1], snp.obs)

Filed under: R, Statistics, University life Tagged: ABC, ABC model choice, abcrf, bioinformatics, CRAN, R, random forests, reference table, SNPs ]]>

Filed under: Mountains, pictures, Travel Tagged: Chelane, Goodall, Hart's Pass, Lake Ross, Methow river, Mondrian forests, Mount Rainier, North Cascades National Park, Pacific North West, Rockies, smokejumper, Twisp, Washington State, wildfire, Winthrop ]]>