|itle:||Automated variable selection for ABC algorithms|
|Abstract:||We discuss here recent advances made in the selection of summaries for approximate Bayesian computation (ABC). In particular, we emphasize the appeal of using machine learning tools such as random forests to build in an automated version summary statistics of a minimum dimension. Conditional to sufficient progress being made in this direction, we will also discuss why and how ABC methods have to be adapted when analyzing large molecular datasets and will present some progress concerning Single Nucleotide Polymorphism (SNP) data.|
|Key words:||Bayesian computation, ABC, SNP, model selection|
Archive for IMS
“I still feel that too much of academic statistics values complex mathematics over elegant simplicity — it is necessary for a research paper to be complicated in order to be published.” Roderick Little, JASA, p.359
Roderick Little wrote his Fisher lecture, recently published in JASA, around ten simple ideas for statistics. Its title is “In praise of simplicity not mathematistry! Ten simple powerful ideas for the statistical scientist”. While this title is rather antagonistic, blaming mathematical statistics for the rise of mathematistry in the field (a term borrowed from Fisher, who also invented the adjective ‘Bayesian’), the paper focus on those 10 ideas and very little on why there is (would be) too much mathematics in statistics:
- Make outcomes univariate
- Bayes rule, for inference under an assumed model
- Calibrated Bayes, to keep inference honest
- Embrace well-designed simulation experiments
- Distinguish the model/estimand, the principles of estimation, and computational methods
- Parsimony — seek a good simple model, not the “right” model
- Model the Inclusion/Assignment and try to make it ignorable
- Consider dropping parts of the likelihood to reduce the modeling part
- Potential outcomes and principal stratification for causal inferenc
- Statistics is basically a missing data problem
“The mathematics of problems with infinite parameters is interesting, but with finite sample sizes, I would rather have a parametric model. “Mathematistry” may eschew parametric models because the asymptotic theory is too simple, but they often work well in practice.” Roderick Little, JASA, p.365
Both those rules and the illustrations that abund in the paper are reflecting upon Little’s research focus and obviously apply to his model in a fairly coherent way. However, while a mostly parametric model user myself, I fear the rejection of non-parametric techniques is far too radical. It is more and more my convinction that we cannot handle the full complexity of a realistic structure in a standard Bayesian manner and that we have to give up on the coherence and completeness goals at some point… Using non-parametrics and/or machine learning on some bits and pieces then makes sense, even though it hurts elegance and simplicity.
“However, fully Bayes inference requires detailed probability modeling, which is often a daunting task. It seems worth sacrifycing some Bayesian inferential purity if the task can be simplified.” Roderick Little, JASA, p.366
I will not discuss those ideas in detail, as some of them make complete sense to me (like Bayesian statistics laying its assumptions in the open) and others remain obscure (e.g., causality) or with limited applicability. It is overall a commendable Fisher lecture that focus on methodology and the practice of statistical science, rather than on theory. I however do not see the reason why maths should be blamed for this state of the field. Nor why mathematical statistics journals like AoS would carry some responsibility in the lack of further applicability in other fields. Students of statistics do need a strong background in mathematics and I fear we are losing ground in this respect, at least judging by the growing difficulty in finding measure theory courses abroad for our exchange undergradutes from Paris-Dauphine. (I also find the model misspecification aspects mostly missing from this list.)
While I was editing our “famous” In praise of the referee paper—well, famous for being my most rejected paper ever!, with one editor not even acknowledging receipt!!—for the next edition of the ISBA Bulletin—where it truly belongs, being in fine a reply to Larry’s tribune therein a while ago—, Dimitris Politis had written a column for the IMS Bulletin—March 2013 Issue, page 11—on Refereeing and psychoanalysis.
Uh?! What?! Psychoanalysis?! Dimitris’ post is about referees being rude or abusive in their report, expressing befuddlement at seeing such behaviour in a scientific review. If one sets aside cases of personal and ideological antagonisms—always likely to occur in academic circles!—, a “good” reason for referees to get aggressively annoyed to the point of rudeness is sloppiness of one kind or another in the paper under review. One has to remember that refereeing is done for free and with no clear recognition in the overwhelming majority of cases, out of a sense of duty to the community and of fairness for having our own papers refereed. Reading a paper where typos abound, where style is so abstruse as to hide the purpose of the work, where the literature is so poorly referenced as to make one doubts the author(s) ever read another paper, the referee may feel vindicated by venting his/her frustration at wasting one’s time by writing a few vitriolic remarks. Dimitris points out this can be very detrimental to young researchers. True, but what happened to the advisor at this stage?! Wasn’t she/he supposed to advise her/his PhD student not only in conducting innovative research but also in producing intelligible outcome and in preparing papers suited for the journal it is to be submitted to..?! Being rude and aggressive does not contribute to improve the setting, no more than headbutting an Italian football player helps in winning the World Cup, but it may nonetheless be understood without resorting to psychoanalysis!
Most interestingly, this negative aspect of refereeing—that can be curbed by posterior actions of AEs and editors—would vanish if some of our proposals were implemented, incl. making referee’ reports part of the referee’s publication list, making those reports public as comments on the published paper (if published), and creating repositories or report commons independent from journals…
More news about MCMSki IV! Remember, the call is still open for contributed sessions for a few more weeks, till March. 20 to be precise (make sure to contact me at firstname.lastname@example.org if you are considering putting one session together). To all those who already submitted a session, thanks a lot, please stay tuned, and we will contact you very soon after March 20!
One exciting item is that there will be a satellite workshop on January 9, on Bayesian non-parametrics and semi-parametrics, organised by Judith Rousseau. BNPski, anyone?! Details are not yet available, but anyone registered for MCMSki IV and interested should be free to attend this workshop, free of charges. (It will take place at the conference centre as well.)
Another item is that we managed to get a cheaper offer for the ski race, reaching an entry prize of 10 euros. Or less if we manage to find this sponsor… Not that bad when considering the high probability competitors have to win a pair of skis!!!
Last item for today: the list and rate of hotels available thru the conference centre is as follows
- ALPINA*** : Single Room 136€ & Twin or Double Room 106€ /pers
- PRIEURE*** : Single Room 136€ & Twin or Double Room 106€ /pers
- LOUVRE** : Single Room 85 € & Twin or Double Room 64€ /pers
- POINTE ISABELLE** : Single Room 81,90€ & Twin or Double Room 59,90€ /perss
However, many other options are available in the vicinity, from hotels to B&B, to rental apartments and chalets, with a wide range of prices if you pre-book early (like now!) See the links on our webpage.
Following a discussion within the IMS publication committee and the coincidental publication of a central double page in Le Monde, weekend science&techno section [not that it was particularly informative!], here are some thoughts of mine on open access and publications:
First, the EU is philosophically inclined toward Open Access and has been putting some money into the game towards that goal:
As of 2014, all articles produced with funding from Horizon 2020 will have to be accessible: articles will
either immediately be made accessible online by the publisher (‘Gold’ open access) – up-front publication costs can be eligible for reimbursement by the European Commission;
or researchers will make their articles available through an open access repository no later than six months (12 months for articles in the fields of social sciences and humanities) after publication (‘Green’ open access).
This means that putting IMS publications on arXiv or on HAL (which is compulsory for CNRS and AERES evaluations, hence for most French public researchers, contrary to what Le Monde states) is fine and sufficient for EU funded research. It seems to be the same in other countries (ok, EU is not yet a country!) like Australia…
My personnal position on the issue is that I do not understand the ‘gold’ open access perspective. Since tax-payers are supporting public-funded research, why should they support the journals that publish this research if it is available on a public depository like arXiv for free? Simply because the publication in the journals gives a validation of the scientific contents? The argument was that it would save money on public libraries subscribing to expensive journals like Elsevier‘s, but paying the ‘gold’ open access is another way of redirecting tax-payers money towards publishers’ pockets, so this sounds like a loophole… I would thus be very much in favour of keeping the arXiv solution as is, since it is the greenest one, as long as we comply with local national regulations.