BibTool on the air

Yesterday night, just before leaving for Coventry, I realised I had about 30 versions of my “mother of all .bib” bib file, spread over directories and with broken links with the original mother file… (I mean, I always create bib files in new directories by a hard link,

    ln ~/mother.bib

but they eventually and inexplicably end up with a life of their own!) So I decided a Spring clean-up was in order and installed BibTool on my Linux machine to gather all those versions into a new encompassing all-inclusive bib reference. I did not take advantage of the many possibilities of the program, written by Gerd Neugebauer, but it certainly solved my problem: once I realised I had to set the variates

check.double = on
check.double.delete = on
pass.comments = off

all I had to do was to call

bibtool -s -i ../*/*.bib -o mother.bib
bibtool -d -i mother.bib -o mother.bib
bibtool -s -i mother.bib -o mother.bib

to merge all bib file and then to get rid of the duplicated entries in mother.bib (the -d option commented out the duplicates and the second call with -s removed them). And to remove the duplicated definitions in the preamble of the file. This took me very little time in the RER train from Paris-Dauphine (where I taught this morning, having a hard time to make the students envision the empirical cdf as an average of Dirac masses!) to Roissy airport, in contrast with my pedestrian replacement of all stray siblings of the mother bib into new proper hard links, one by one. I am sure there is a bash command that could have done it in one line, but I spent instead my flight to Birmingham switching all existing bib files, one by one…

echo vulnerable

screen shot with ubuntu 10.10Even though most people are now aware of the Shellshock security problem on the bash shell, here is a test to check whether your Unix system is at risk:

env x='() { :;}; echo vulnerable' bash -c 'echo hello'

if the prompt returns vulnerable, it means the system is vulnerable and needs to be upgraded with the proper security patch… For instance running

sudo apt-get update && sudo apt-get install --only-upgrade bash

for Debian/Ubuntu versions. Check Apple support page for Apple OS.

implementing reproducible research [short book review]

As promised, I got back to this book, Implementing reproducible research (after the pigeons had their say). I looked at it this morning while monitoring my students taking their last-chance R exam (definitely last chance as my undergraduate R course is not reconoduced next year). The book is in fact an edited collection of papers on tools, principles, and platforms around the theme of reproducible research. It obviously links with other themes like open access, open data, and open software. All positive directions that need more active support from the scientific community. In particular the solutions advocated through this volume are mostly Linux-based. Among the tools described in the first chapter, knitr appears as an alternative to sweave. I used the later a while ago and while I like its philosophy. it does not extend to situations where the R code within takes too long to run… (Or maybe I did not invest enough time to grasp the entire spectrum of sweave.) Note that, even though the book is part of the R Series of CRC Press, many chapters are unrelated to R. And even more [unrelated] to statistics.

This limitation is somewhat my difficulty with [adhering to] the global message proposed by the book. It is great to construct such tools that monitor and archive successive versions of code and research, as anyone can trace back the research steps conducting to the published result(s). Using some of the platforms covered by the book establishes for instance a superb documentation principle, going much further than just providing an “easy” verification tool against fraudulent experiments. The notion of a super-wiki where notes and preliminary versions and calculations (and dead ends and failures) would be preserved for open access is just as great. However this type of research processing and discipline takes time and space and human investment, i.e. resources that are sparse and costly. Complex studies may involve enormous amounts of data and, neglecting the notions of confidentiality and privacy, the cost of storing such amounts is significant. Similarly for experiments that require days and weeks of huge clusters. I thus wonder where those resources would be found (journals, universities, high tech companies, …?) for the principle to hold in full generality and how transient they could prove. One cannot expect the research time to garantee availability of those meta-documents for remote time horizons. Just as a biased illustration, checking the available Bayes’ notebooks meant going to a remote part of London at a specific time and with a preliminary appointment. Those notebooks are not available on line for free. But for how long?

“So far, Bob has been using Charlie’s old computer, using Ubuntu 10.04. The next day, he is excited to find the new computer Alice has ordered for him has arrived. He installs Ubuntu 12.04” A. Davison et al.

Putting their principles into practice, the authors of Implementing reproducible research have made all chapters available for free on the Open Science Framework. I thus encourage anyone interesting in those principles (and who would not be?!) to peruse the chapters and see how they can benefit from and contribute to open and reproducible research.

hiccups or death throes for pangolin?

screen shot with ubuntu 10.10Today my Ubuntu system had a strange pathology, almost freezing in any Internet connection, whether using a cable connection or wireless, even ping did not answer and nothing wrong on the process table… I checked the hardware by dual-booting on windows (for the first time since I installed Linux on this laptop) and seeing no such feature on Explorer. After a few reboots, it came back to normal. I wonder if this is an incomplete safety upgrade or a signal that my cheap laptop is coming close to the end of its life cycle. A life started right after Kyoto. Since I am leaving for Pittsburgh and Toronto next week, I should try to re-install a Linux version on another machine this weekend….

Vodafone USB Stick K3773

As the issue of having to pay a lot to connect to the Internet during my jetlagged night sessions was bothering me, I bought a 3G device in a nearby Vodafone shop to do the job (at the mere cost of four hours of connection!). It is a Vodafone USB Stick K3773 and I was unsure it would work under Linux (Ubuntu 12.04). When mounting the USB stick, I first saw there was a Linux directory on the stick, with the instruction to run ./install as root. I tried that and it did not work. I then checked the stick was working under windows, which was the case. I could not find helpful advices on the forums (fori!). So I fiddled for a few minutes and came with the idea of installing from my directory rather than from the (read-only) disk. Changed the attributes of the install file to executable. And ran it again. It surprisingly worked! The device is now recognised as a network wired connection (eth1) whenever I start my (Compaq) laptop.

Here are my commands, in case it helps:

 xian$ mkdir QuickStart
 xian$ cp /media/QuickStart\ 3.7/linux_mbb_install/* QuickStart/.
 xian$ cd QuickStart
 xian$ chmod +x install
 xian$ sudo ./install

It is rather slow (20 Kb/s) but this may a good way to manage my time on line!