Archive for the Books Category
The nice logo of MCqMC 2016 was a collection of eight series of QMC dots on the unit (?) cube. The organisers set a competition to identify the principles behind those quasi-random sets and as I had no idea for most of them I entered very random sets unconnected with algorithmia, for which I got an honourable mention and a CD prize (if not the conference staff tee-shirt I was coveting!) Art Owen sent me back my entry, posted below and hopefully (or not!) readable.
Last week, Vakilzadeh, Beck and Abrahamsson arXived a paper entitled “Using Approximate Bayesian Computation by Subset Simulation for Efficient Posterior Assessment of Dynamic State-Space Model Classes”. It follows an earlier paper by Beck and co-authors on ABC by subset simulation, paper that I did not read. The model of interest is a hidden Markov model with continuous components and covariates (input), e.g. a stochastic volatility model. There is however a catch in the definition of the model, namely that the observable part of the HMM includes an extra measurement error term linked with the tolerance level of the ABC algorithm. Error term that is dependent across time, the vector of errors being within a ball of radius ε. This reminds me of noisy ABC, obviously (and as acknowledged by the authors), but also of some ABC developments of Ajay Jasra and co-authors. Indeed, as in those papers, Vakilzadeh et al. use the raw data sequence to compute their tolerance neighbourhoods, which obviously bypasses the selection of a summary statistic [vector] but also may drown signal under noise for long enough series.
“In this study, we show that formulating a dynamical system as a general hierarchical state-space model enables us to independently estimate the model evidence for each model class.”
Subset simulation is a nested technique that produces a sequence of nested balls (and related tolerances) such that the conditional probability to be in the next ball given the previous one remains large enough. Requiring a new round of simulation each time. This is somewhat reminding me of nested sampling, even though the two methods differ. For subset simulation, estimating the level probabilities means that there also exists a converging (and even unbiased!) estimator for the evidence associated with different tolerance levels. Which is not a particularly natural object unless one wants to turn it into a tolerance selection principle, which would be quite a novel perspective. But not one adopted in the paper, seemingly. Given that the application section truly compares models I must have missed something there. (Blame the long flight from San Francisco to Sydney!) Interestingly, the different models as in Table 4 relate to different tolerance levels, which may be an hindrance for the overall validation of the method.
I find the subsequent part on getting rid of uncertain prediction error model parameters of lesser [personal] interest as it essentially replaces the marginal posterior on the parameters of interest by a BIC approximation, with the unsurprising conclusion that “the prior distribution of the nuisance parameter cancels out”.
I got contacted by an author, Thomai Dion, toward writing a review of her children books, The Animal Cell, The Neuron, and a Science Lab’ Notebook. And I thus asked for the books to get a look. Which I get prior to my long flight from San Francisco to Sydney, most conveniently. [This is the second time this happens: I have been contacted once by an author of a most absurd book, a while ago.]
I started with the cell, which is a 17 pages book with a few dozen sentences, and one or more pictures per page. Pictures drawn in a sort of naïve fashion that should appeal to young children. Being decades away from being a kid and more than a decade away from raising a kid (happy 20th birthday, Rachel!), I have trouble assessing the ideal age of the readership or the relevance of introducing to them [all] 13 components of an animal cell, from the membrane to the cytoplasm. Mentioning RNA and DNA without explaining what it is. Each of these components gets added to the cell picture as it comes, with a one line description of its purpose. I wonder how much a kid can remember of this list, while (s)he may wonder where those invisible cells stand. And why they are for. (When checking on Google, I found this sequence of pages more convincing, if much more advanced. Again, I am not the best suited for assessing how kids would take it!)
The 21 pages book about the neurons is more explanatory than descriptive and I thus found it more convincing (again with not much of an idea of how a kid would perceive it!). It starts from the brain sending signals, to parts of the body and requiring a medium to do so, which happens to be made of neurons. Once again, though, I feel the book spends too much time on the description rather than on the function of the neurons, e.g., with no explanation of how the signal moves from the brain to the neuron sequence or from the last neuron to the muscle involved.
The (young) scientist notebook is the best book in the series in my opinion: it reproduces a lab book and helps a young kid to formalise what (s)he thinks is a scientific experiment. As a kid, I did play at conducting “scientific” “experiments” with whatever object I happened to find, or later playing with ready-made chemistry and biology sets, but having such a lab book would have been terrific! Setting the question of interest and the hypothesis or hypotheses behind it prior to running the experiment is a major lesson in scientific thinking that should be offered to every kid! However, since it contains no pictures but mostly blank spaces to be filled by the young reader, one could suggest to parents to print such lab report sheets themselves.
Following a discusion I had with Victor Elvirà about Spanish books, I ordered a book by Arturo Pérez-Reverte called a Day of Wrath (un día de cólera), but apparently not translated into English. The day of wrath is the second of May, 1808, when the city of Madrid went to arms against the French occupation by Napoléon’s troops. An uprising that got crushed by Murat’s repression the very same day, but which led to the entire Spain taking arms against the occupation. The book is written out of historical accounts of the many participants to the uprising, from both Madrilene and French sides. Because of so many viewpoints being reported, some for a single paragraph before the victims die, the literary style is not particularly pleasant, but this is nonetheless a gripping book that I read within a single day while going (or trying to get) to San Francisco. And it is historically revealing of how unprepared the French troops were about an uprising by people mostly armed with navajas and a few hunting rifles. Who still managed to hold parts of the town for most of a day, with the help of a single artillery battalion while the rest of the troops stayed in their barracks. The author actually insists very much on that aspect, that the rebellion was mostly due to the action of the people, while leading classes, the Army, and the clergy almost uniformly condemned it. Upped estimations on the number of deaths on that day (and the following days) range around 500 for Madrilenes and 150 for French tropps, but the many stories running in the book give the impression of many more casualties.
On Thursday, Christoph Aistleiter [from TU Gräz] gave a plenary talk at MCqMC 2016 around Hermann Weyl’s 1916 paper, Über die Gleichverteilung von Zahlen mod. Eins, which demonstrates that the sequence a, 22a, 32a, … mod 1 is uniformly distributed on the unit interval when a is irrational. Obviously, the notion was not introduced for simulation purposes, but the construction applies in this setting! At least in a theoretical sense. Since for instance the result that the sequence (a,a²,a³,…) mod 1 being uniformly distributed for almost all a’s has not yet found one realisation a. But a nice hour of history of mathematics and number theory: it is not that common we hear the Riemann zeta function mentioned in a simulation conference!
The following session was a nightmare in that I wanted to attend all four at once! I eventually chose the transport session, in particular because Xiao-Li advertised it at the end of my talk. The connection is that his warp bridge sampling technique provides a folding map between modes of a target. Using a mixture representation of the target and folding all components to a single distribution. Interestingly, this transformation does not require a partition and preserves the normalising constants [which has a side appeal for bridge sampling of course]. In a problem with an unknown number of modes, the technique could be completed by [our] folding in order to bring the unobserved modes into the support of the folded target. Looking forward the incoming paper! The last talk of this session was by Matti Vihola, connecting multi-level Monte Carlo and unbiased estimation à la Rhee and Glynn, paper that I missed when it got arXived last December.
The last session of the day was about probabilistic numerics. I have already discussed extensively about this approach to numerical integration, to the point of being invited to the NIPS workshop as a skeptic! But this was an interesting session, both with introductory aspects and with new ones from my viewpoint, especially Chris Oates’ description of a PN method for handling both integrand and integrating measure as being uncertain. Another arXival that went under my decidedly deficient radar.
In connection with the current Olympics in Rio, the New York Times produced a sequence of graphs displaying the dominance of some countries for some sports, like the above for long distance running. I find the representation pretty poor, from using a continuous time perspective for 30 Olympic events, to an unexplained colour codes singling out a few countries, to an equally unexplained second axis, with an upward drift above that does not seem to make sense…