A few weeks after the editorial “Algorithms and Blues“, Nature offers another (general public) entry on AIs and their impact on society, entitled “The Black Box of AI“. The call is less on open source AIs and more on accountability, namely the fact that decisions produced by AIS and impacting people one way or another should be accountable. Rather than excused by the way out “the computer said so”. What the article exposes is how (close to) impossible this is when the algorithms are based on black-box structures like neural networks and other deep-learning algorithms. While optimised to predict as accurately as possible one outcome given a vector of inputs, hence learning in that way how the inputs impact this output [in the same range of values], these methods do not learn in a more profound way in that they very rarely explain why the output occurs given the inputs. Hence, given a neural network that predicts go moves or operates a self-driving car, there is a priori no knowledge to be gathered from this network about the general rules of how humans play go or drive cars. This rather obvious feature means that algorithms that determine the severity of a sentence cannot be argued as being rational and hence should not be used per se (or that the judicial system exploiting them should be sued). The article is not particularly deep (learning), but it mentions a few machine-learning players like Pierre Baldi, Zoubin Ghahramani and Stéphane Mallat, who comments on the distance existing between those networks and true (and transparent) explanations. And on the fact that the human brain itself goes mostly unexplained. [I did not know I could include such dynamic images on WordPress!]
In the March 2016 issue of JASA that currently sits on my desk, there is a paper by Liang, Jim, Song and Liu on the adaptive exchange algorithm, which aims at handling posteriors for sampling distributions with intractable normalising constants. The concept behind the algorithm is the exchange principle initiated by Jesper Møller and co-authors in 2006, where an auxiliary pseudo-observation is simulated for the missing constants to vanish in a Metropolis-Hastings ratio. (The name exchangeable was introduced in a subsequent paper by Iain Murray, Zoubin Ghahramani and David MacKay, also in 2006.)
The crux of the method is to run an iteration as [where y denotes the observation]
- Proposing a new value θ’ of the parameter from a proposal q(θ’|θ);
- Generate a pseudo-observation z~ƒ(z|θ’);
- Accept with probability
which has the appeal to cancel all normalising constants. And the repeal of requiring an exact simulation from the very distribution with the missing constant, ƒ(.|θ). Which means that in practice a finite number of MCMC steps will be used and will bias the outcome. The algorithm is unusual in that it replaces the exact proposal q(θ’|θ) with an unbiased random version q(θ’|θ)ƒ(z|θ’), z being just an augmentation of the proposal. (The current JASA paper by Liang et al. seems to confuse augment and argument, see p.378.)
To avoid the difficulty in simulating from ƒ(.|θ), the authors draw pseudo-observations from sampling distributions with a finite number m of parameter values under the [unrealistic] assumption (A⁰) that this collection of values provides an almost complete cover of the posterior support. One of the tricks stands with an auxiliary [time-heterogeneous] chain of pseudo-observations generated by single Metropolis steps from one of these m fixed targets. These pseudo-observations are then used in the main (or target) chain to define the above exchange probability. The auxiliary chain is Markov but time-heterogeneous since the probabilities of accepting a move are evolving with time according to a simulated annealing schedule. Which produces a convergent estimate of the m normalising constants. The main chain is not Markov in that it depends on the whole history of the auxiliary chain [see Step 5, p.380]. Even jointly the collection of both chains is not Markov. The paper prefers to consider the process as an adaptive Markov chain. I did not check the rather intricate in details, so cannot judge of the validity of the overall algorithm; I simply note that one condition (A², p.383) is incredibly strong in that it assumes the Markov transition kernel to be Doeblin uniformly on any compact set of the calibration parameters. However, the major difficulty with this approach seems to be in its delicate calibration. From providing a reference set of m parameter values scanning the posterior support to picking transition kernels on both the parameter and the sample spaces, to properly cooling the annealing schedule [always a fun part!], there seems to be [from my armchair expert’s perspective, of course!] a wide range of opportunities for missing the target or running into zero acceptance problems. Both examples analysed in the paper, the auto-logistic and the auto-normal models, are actually of limited complexity in that they depend on a few parameters, 2 and 4 resp., and enjoy sufficient statistics, of dimensions 2 and 4 as well. Hence simulating (pseudo-)realisations of those sufficient statistics should be less challenging than the original approach replicating an entire vector of thousands of dimensions.
Tomorrow I am off to Venezia for three days, attending the ESOBE 2016 workshop, where ESOBE stands for European Seminar on Bayesian Econometrics. This year it is indeed taking place in Venezia, Università Ca’ Foscari, in this beautiful building on the Gran Canale, and I have been invited to give a talk. Excited to get back to this unique place, hoping the high water will not be too high to prevent getting around (at random as usual).
One approach to random number generation that had always intrigued me is Kinderman and Monahan’s (1977) ratio-of-uniform method. The method is based on the result that the uniform distribution on the set A of (u,v)’s in R⁺xX such that
induces the distribution with density proportional to ƒ on V/U. Hence the name. The proof is straightforward and the result can be seen as a consequence of the fundamental lemma of simulation, namely that simulating from the uniform distribution on the set B of (w,x)’s in R⁺xX such that
induces the marginal distribution with density proportional to ƒ on X. There is no mathematical issue with this result, but I have difficulties with picturing the construction of efficient random number generators based on this principle.
I thus took the opportunity of the second season of [the Warwick reading group on] Non-uniform random variate generation to look anew at this approach. (Note that the book is freely available on Luc Devroye’s website.) The first thing I considered is the shape of the set A. Which has nothing intuitive about it! Luc then mentions (p.195) that the boundary of A is given by
which then leads to bounding both ƒ and x→x²ƒ(x) to create a box around A and an accept-reject strategy, but I have trouble with this result without making further assumptions about ƒ… Using a two component normal mixture as a benchmark, I found bounds on u(.) and v(.) and simulated a large number of points within the box to end up with the above graph that indeed the accepted (u,v)’s were within this boundary. And the same holds with a more ambitious mixture:
Although I bought this second volume in the Fitz and the Fool trilogy quite a while ago, I only came to read it very recently. And enjoyed it unreservedly! While the novel builds upon the universe Hobb created in the liveship traders trilogy (forget the second trilogy!) and the Assassin and Fool trilogies, the story is compelling enough to bring out excitement and longing for further adventures of Fitz and the Fool. Many characters that were introduced in the earlier volume suddenly take on substance and meaning, while the main characters are no longer heroes of past eras, but also acquire further depth and subtlety. Even long-lasting ones like Chade. I cannot tell whether this new dimension of the plights affecting the Six Duchies and its ruler, King Verity, was conceived from the start or came later to the author, but it really fits seamlessly and increases by several orders of magnitude the epic feeling of the creation. Although it is hard to rank this book against the very first ones, like Royal Assassin, I feel this is truly one of the best of Hobb’s books, with the right mixture of action, plotting, missed opportunities and ambiguous angles about the main characters. So many characters truly come to life in this volume that I bemoan the sluggish pace of the first one even more now. While one could see Fool’s Quest as the fourteenth book in the Realm of the Elderlings series, and hence hint at senseless exploitation of the same saga, there are just too many new threads and perspective there to maintain this posture. A wonderful book and a rarity of a middle book being so. I am clearly looking forward the third instalment!