Filed under: pictures, Travel Tagged: Bangalore, India, Kannada, Karnataka, KR Market, storm ]]>

Filed under: Books, Kids, pictures Tagged: Austro-Hungary, Frist World War, Serbia, telegram, war ]]>

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions q

_{i,t}can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q) (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(q

_{i,t}) (**)or does (*) apply to every q

_{i,t}?For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

**F**airly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we work on with French colleagues in the early 2000’s. (The name population Monte Carlo comes from Iba, 2000.) While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑_{i,t} {p(x_{i,t})/q_{i,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x})

this estimator is only unbiased when the supports of the q_{i,t }“s are all containing the support of p. The only other cases I can think of are

- associating the q
_{i,t }“s with a partition S_{i,t}of the support of p and using instead∑

_{i,t}{p(x_{i,t})/q_{i,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x}) - resorting to AMIS under the assumption (**) and using instead
1/n ∑

_{i,t}{p(x_{i,t})/∑_{j,t}q_{j,t}(x_{i,t})}h(q_{i,t}), x_{i,t~}q_{i,t(x})

but I am open to further suggestions!

Filed under: Statistics, University life Tagged: AMIS, CUNY, importance sampling, Monte Carlo Statistical Methods, PMC, population Monte Carlo, simulation, unbiasedness ]]>

Filed under: Mountains, pictures, Travel Tagged: Air France, cockpit, Mount Süphan, Turkey, Van Lake ]]>

Filed under: Books, Kids, pictures, Travel, University life Tagged: 17 equations That Changed the World, cemetary, Cimetière du Montparnasse, France, Henri Poincaré, Paris, STAN, Stanislas Ulam ]]>

**I** am off to Bangalore for a few days, taking part in an Indo-French workshop on statistics and mathematical biology run by the Indo-French Centre for Applied Mathematics (IFCAM).

Filed under: Statistics, Travel, University life Tagged: Bangalore, IFCAM, India, workshop ]]>

Filed under: pictures, Running, Travel Tagged: art brut, Bagneux, biking, CREST, paint, stains, truck ]]>

Filed under: Books, Statistics, University life Tagged: complexity, evidence, Kolmogorov-Smirnov distance, Multinest, nested sampling, shrinkage test ]]>

**F**irst up Dennis Prangle presented his recent work on “Lazy ABC”, which can speed up ABC by potentially abandoning model simulations early that do not look promising. Dennis introduces a continuation probability to ensure that the target distribution of the approach is still the ABC target of interest. In effect, the ABC likelihood is estimated to be 0 if early stopping is performed otherwise the usual ABC likelihood is inflated by dividing by the continuation probability, ensuring an unbiased estimator of the ABC likelihood. The drawback is that the ESS (Dennis uses importance sampling) of the lazy approach will likely be less than usual ABC for a fixed number of simulations; but this should be offset by the reduction in time required to perform said simulations. Dennis also presented some theoretical work for optimally tuning the method, which I need more time to digest.

**T**his was followed by my talk on Bayesian indirect inference methods that use a parametric auxiliary model (a slightly older version here). This paper has just been accepted by Statistical Science.

**M**orning tea was followed by my PhD student, Brenda Vo, who presented an interesting application of ABC to cell spreading experiments. Here an estimate of the diameter of the cell population was used as a summary statistic. It was noted after Brenda’s talk that this application might be a good candidate for Dennis’ Lazy ABC idea. This talk was followed by a much more theoretical presentation by Pierre del Moral on how particle filter methodologies can be adapted to the ABC setting and also a general framework for particle methods.

**F**ollowing lunch, Guilherme Rodrigues presented a hierarchical Gaussian Process model for kernel density estimation in the presence of different subgroups. Unfortunately my (lack of) knowledge on non-parametric methods prevents me from making any further comment except that the model looked very interesting and ABC seemed a good candidate for calibrating the model. I look forward to the paper appearing on-line.

**T**he next presentation was by Gael Martin who spoke about her research on using ABC for estimation of complex state space models. This was probably my favourite talk of the day, and not only because it is very close to my research interests. Here the score of the Euler discretised approximation of the generative model was used as summary statistics for ABC. From what I could gather, it was demonstrated that the ABC posterior based on the score or the MLE of the auxiliary model were the same in the limit as ε 0 (unless I have mis-interpreted). This is a very useful result in itself; using the score to avoid an optimisation required for the MLE can save a lot of computation. The improved approximations of the proposed approach compared with the results that use the likelihood of the Euler discretisation were quite promising. I am certainly looking forward to this paper coming out.

**M**att Moores drew the short straw and had the final presentation on the Friday afternoon. Matt spoke about this paper (an older version is available here), of which I am now a co-author. Matt’s idea is that doing some pre-simulations across the prior space and determining a mapping between the parameter of interest and the mean and variance of the summary statistic can significantly speed up ABC for the Potts model, and potentially other ABC applications. The results of the pre-computation step are used in the main ABC algorithm, which no longer requires simulation of pseudo-data but rather a summary statistic can be simulated from the fitted auxiliary model in the pre-processing step. Whilst this approach does introduce a couple more layers of approximation, the gain in computation time was up to two orders of magnitude. The talks by Matt, Gael and myself gave a real indirect inference flavour to this year’s ABC in…

Filed under: pictures, Statistics, University life Tagged: abc-in-sydney, Australia, Chris Drovandi, Sydney ]]>

**S**witching between a scalable computation session with Alex Beskos, who talked about adaptive Langevin algorithms for differential equations, and a non-local prior session, with David Rossell presenting a smoother way to handle point masses in order to accommodate frequentist coverage. Something we definitely need to discuss the next time I am in Warwick! Although this made me alas miss both the first talk of the non-local session by Shane Jensen the final talk of the scalable session by Doug Vandewrken where I happened to be quoted (!) for my warning about discretising Markov chains into non-Markov processes. In the 1998 JASA paper with Chantal Guihenneuc.

**A**fter a farewell meal of ceviche with friends in the sweltering humidity of a local restaurant, I attended [the newly elected ISBA Fellow!] Maria Vanucci’s talk on her deeply involved modelling of fMRI. The last talk before the airport shuttle was François Caron’s description of a joint work with Emily Fox on a sparser modelling of networks, along with an auxiliary variable approach that allowed for parallelisation of a Gibbs sampler. François mentioned an earlier alternative found in machine learning where all components of a vector are updated simultaneously conditional on the previous avatar of the other components, e.g. simulating (x’,y’) from π(x’|y) π(y’|x) which does not produce a convergent Markov chain. At least not convergent to the right stationary. However, running a quick [in-flight] check on a 2-d normal target did not show any divergent feature, when compared with the regular Gibbs sampler. I thus wonder at what can be said about the resulting target or which conditions are need for divergence. A few scribbles later, I realised that the 2-d case was the exception, namely that the stationary distribution of the chain is the product of the marginal. However, running a 3-d example with an auto-exponential distribution in the taxi back home, I still could not spot a difference in the outcome.

Filed under: pictures, Statistics, Travel, University life Tagged: Cancún, ISBA, Langevin MCMC algorithm, MCMC algorithms, non-local priors, University of Warwick ]]>