Mardi 26 novembre 2013 à 14h00
Salle de Conseil, 4ème étage (LINCS) 23 AVENUE D’ITALIE 75013 PARIS
Titre de l’exposé : Feature Selection for Neuro-Dynamic Programming
Neuro-Dynamic Programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques. This talk reviews two popular approaches to neuro-dynamic programming, TD-learning and Q-learning. The main goal of this work is to demonstrate how insight from idealized models can be used as a guide for feature selection for these algorithms. Several approaches are surveyed, including fluid and diffusion models, and the application of idealized models arising from mean-field game approximations. The theory is illustrated with several examples.
Archive for Markov chains
Here is the new version of the talk:
And I had a fairly interesting day at the conference, from Randal’s talk on hidden Markov models with finite valued observables to the two Terrys invited session (Terry Lyons vs. Terry Speed) to the machine learning session organised by a certain Michal Jordan (on the program) that turned out to be Michael Jordan (with a talk on the fusion between statistics and computability). A post-session chat with Terry Lyons also opened (to me) new perspectives on data summarisation. (And we at last managed to get a convergence result using a Rao-Blackwellisation argument!) Plus, we ended up the day in a nice bistrot called Zeller with an awfully friendly staff cooking family produces and serving fruity family wines and not yet spoiled by being ranked #1 on tripadvisor (but visibly attracting a lot of tourists like us).
Today my student Xiaolin Cheng presented the mythical 1990 JASA paper of Alan Gelfand and Adrian Smith, Sampling-based approaches to calculating marginal densities. The very one that started the MCMC revolution of the 1990′s! Re-reading it through his eyes was quite enlightening, even though he stuck quite closely to the paper. (To the point of not running his own simulation, nor even reporting Gelfand and Smith’s, as shown by the slides below. This would have helped, I think…)
Indeed, those slides focus very much on the idea that such substitution samplers can provide parametric approximations to the marginal densities of the components of the simulated parameters. To the point of resorting to importance sampling as an alternative to the standard Rao-Blackwell estimate, a solution that did not survive long. (We briefly discussed this point during the seminar, as the importance function was itself based on a Rao-Blackwell estimate, with possibly tail issues. Gelfand and Smith actually conclude on the higher efficiency of the Gibbs sampler.) Maybe not so surprisingly, the approximation of the “other” marginal, namely the marginal likelihood, as it is much more involved (and would lead to the introduction of the infamous harmonic mean estimator a few years later! And Chib’s (1995), which is very close in spirit to the Gibbs sampler). While Xiaolin never mentioned Markov chains in his talk, Gelfand and Smith only report that Gibbs sampling is a Markovian scheme, and refer to both Geman and Geman (1984) and Tanner and Wong (1987), for convergence issues. Rather than directly invoking Markov arguments as in Tierney (1994) and others. A fact that I find quite interesting, a posteriori, as it highlights the strong impact Meyn and Tweedie would have, three years later.
As Xiao-Li Meng accepted to review—and I am quite grateful he managed to fit this review in an already overflowing deanesque schedule!— our 2004 book Monte Carlo Statistical Methods as part of a special book review issue of CHANCE honouring the memory of George thru his books—thanks to Sam Behseta for suggesting this!—, he sent me the following email about one of our proofs—demonstrating how much efforts he had put into this review!—:
I however have a question about the proof of Lemma 7.3 on page 273. After the expression of E[h(x^(1)|x_0], the proof stated "and substitute Eh(x) for h(x_1)". I cannot think of any justification for this substitution, given the whole purpose is to show h(x) is a constant.
I put it on hold for a while and only looked at it in the (long) flight to Chicago. Lemma 7.3 in Monte Carlo Statistical Methods is the result that the Metropolis-Hastings algorithm is Harris recurrent (and not only recurrent). The proof is based on the characterisation of Harris recurrence as having only constants for harmonic functions, i.e. those satisfying the identity
The chain being recurrent, the above implies that harmonic functions are almost everywhere constant and the proof steps from almost everywhere to everywhere. The fact that the substitution above—and I also stumbled upon that very subtlety when re-reading the proof in my plane seat!—is valid is due to the fact that it occurs within an integral: despite sounding like using the result to prove the result, the argument is thus valid! Needless to say, we did not invent this (elegant) proof but took it from one of the early works on the theory of Metropolis-Hastings algorithms, presumably Luke Tierney’s foundational Annals paper work that we should have quoted…
As pointed out by Xiao-Li, the proof is also confusing for the use of two notations for the expectation (one of which is indexed by f and the other corresponding to the Markov transition) and for the change in the meaning of f, now the stationary density, when compared with Theorem 6.80.