Potts model | Xi'an's Og

Archive for Potts model

human data

Posted in Books, Statistics with tags algorithmically infused societies, computational social science, Ising model, Leo Breiman, linear regression, machine learning, Nature, Potts model on August 28, 2021 by xi'an

Due to a mishap in the handling of my Nature subscription, I received six issues at once when returning from vacations… Among which the 8 July issue with a special focus on computational social science, including five “perspective” articles on this topic.

“The need to blend expertise in the social sciences with the skills required to collect, clean and analyse large data sets means that computational social science requires teams of researchers who can field a remarkably diverse set of expertise and skills”

These perspective articles are however rather bland and mostly generic, beyond the feel-good call to protect privacy, counter biases, acknowledge the observational nature of most of the available data and the behavioural impact of the many algorithms interacting with us. The only formula appearing in the collection is the linear regression prediction ŷ=βx, which offers four different interpretations whether or no there are interventions, and if the fovus is on explanation or prediction (referring to Breiman’s two cultures in passing).

“Using computers to analyse large data sets dates back to the earliest mainframe computers — and has been central to the work of actuaries and national statistics offices, both of which have long been important resources for studies of society and people.”

I am also very chagrined by the almost complete absence of references to statistics, the sole article clearly mentioning the term actually involving statistical physics! With a reference to Ising and Potts models to analyse social networks. While the one on algorithmically infused societies sees statistical and machine learning models as potentially harmful. Or as being misleadingly explanatory. But no discussion of the statistical meaning of model assessment and of multiscale model building. (To be fair, the references list all include links to statistics papers.)

1 Comment »

approximate likelihood perspective on ABC

Posted in Books, Statistics, University life with tags ABC, Approximate Bayesian computation, approximate likelihood, curse of dimensionality, g-and-k distributions, Gibbs sampling, IMS, MCqMC 2018, mixed effect models, Potts model, Statistics Surveys, summary statistics, survey, tolerance, winference on December 20, 2018 by xi'an

George Karabatsos and Fabrizio Leisen have recently published in Statistics Surveys a fairly complete survey on ABC methods [which earlier arXival I had missed]. Listing within an extensive bibliography of 20 pages some twenty-plus earlier reviews on ABC (with further ones in applied domains)!

“(…) any ABC method (algorithm) can be categorized as either (1) rejection-, (2) kernel-, and (3) coupled ABC; and (4) synthetic-, (5) empirical- and (6) bootstrap-likelihood methods; and can be combined with classical MC or VI algorithms [and] all 22 reviews of ABC methods have covered rejection and kernel ABC methods, but only three covered synthetic likelihood, one reviewed the empirical likelihood, and none have reviewed coupled ABC and bootstrap likelihood methods.”

The motivation for using approximate likelihood methods is provided by the examples of g-and-k distributions, although the likelihood can be efficiently derived by numerical means, as shown by Pierre Jacob‘s winference package, of mixed effect linear models, although a completion by the mixed effects themselves is available for Gibbs sampling as in Zeger and Karim (1991), and of the hidden Potts model, which we covered by pre-processing in our 2015 paper with Matt Moores, Chris Drovandi, Kerrie Mengersen. The paper produces a general representation of the approximate likelihood that covers the algorithms listed above as through the table below (where t(.) denotes the summary statistic):

The table looks a wee bit challenging simply because the review includes the synthetic likelihood approach of Wood (2010), which figured preeminently in the 2012 Read Paper discussion but opens the door to all kinds of approximations of the likelihood function, including variational Bayes and non-parametric versions. After a description of the above versions (including a rather ignored coupled version) and the special issue of ABC model choice, the authors expand on the difficulties with running ABC, from multiple tuning issues, to the genuine curse of dimensionality in the parameter (with unnecessary remarks on low-dimension sufficient statistics since they are almost surely inexistent in most realistic settings), to the mis-specified case (on which we are currently working with David Frazier and Judith Rousseau). To conclude, an worthwhile update on ABC and on the side a funny typo from the reference list!

Li, W. and Fearnhead, P. (2018, in press). On the asymptotic efficiency
of approximate Bayesian computation estimators. Biometrika na na-na.

1 Comment »

Monte Carlo methods for Potts models

Posted in pictures, Statistics, University life with tags computation theory, importance sampling, inference, information theory, Potts model on March 10, 2016 by xi'an

There will be a seminar talk by Mehdi Molkaraie (Pompeu Fabra) next week at Institut Henri Poincaré (IHP), Paris, on his paper with Vincent Gomez.

We consider the problem of estimating the partition function of the ferromagnetic q-state Potts model. We propose an importance sampling algorithm in the dual of the normal factor graph representing the model. The algorithm can efficiently compute an estimate of the partition function when the coupling parameters of the model are strong (corresponding to models at low temperature) or when the model contains a mixture of strong and weak couplings. We show that, in this setting, the proposed algorithm significantly outperforms the state of the art methods.

The talk is at 14:30, March 17. It is part of a trimester program on information and computation theories I was completely unaware of.

scalable Bayesian inference for the inverse temperature of a hidden Potts model

Posted in Books, R, Statistics, University life with tags ABC, Approximate Bayesian computation, Australia, Brisbane, exchange algorithm, Ising model, JCGS, path sampling, Potts model, pseudo-likelihood, QUT, Statistics and Computing on April 7, 2015 by xi'an

Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β]. I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…

Pre-processing for approximate Bayesian computation in image analysis

Posted in R, Statistics, University life with tags ABC, Chamonix, image processing, MCMC, MCMSki IV, Monte Carlo Statistical Methods, path sampling, Potts model, QUT, simulation, SMC-ABC, Statistics and Computing, sufficient statistics, summary statistics on March 21, 2014 by xi'an

With Matt Moores and Kerrie Mengersen, from QUT, we wrote this short paper just in time for the MCMSki IV Special Issue of Statistics & Computing. And arXived it, as well. The global idea is to cut down on the cost of running an ABC experiment by removing the simulation of a humongous state-space vector, as in Potts and hidden Potts model, and replacing it by an approximate simulation of the 1-d sufficient (summary) statistics. In that case, we used a division of the 1-d parameter interval to simulate the distribution of the sufficient statistic for each of those parameter values and to compute the expectation and variance of the sufficient statistic. Then the conditional distribution of the sufficient statistic is approximated by a Gaussian with these two parameters. And those Gaussian approximations substitute for the true distributions within an ABC-SMC algorithm à la Del Moral, Doucet and Jasra (2012).

Across 20 125 × 125 pixels simulated images, Matt’s algorithm took an average of 21 minutes per image for between 39 and 70 SMC iterations, while resorting to pseudo-data and deriving the genuine sufficient statistic took an average of 46.5 hours for 44 to 85 SMC iterations. On a realistic Landsat image, with a total of 978,380 pixels, the precomputation of the mapping function took 50 minutes, while the total CPU time on 16 parallel threads was 10 hours 38 minutes. By comparison, it took 97 hours for 10,000 MCMC iterations on this image, with a poor effective sample size of 390 values. Regular SMC-ABC algorithms cannot handle this scale: It takes 89 hours to perform a single SMC iteration! (Note that path sampling also operates in this framework, thanks to the same precomputation: in that case it took 2.5 hours for 10⁵ iterations, with an effective sample size of 10⁴…)

Since my student’s paper on Seaman et al (2012) got promptly rejected by TAS for quoting too extensively from my post, we decided to include me as an extra author and submitted the paper to this special issue as well.

1 Comment »

	xi'an on new arXiv rendering
	xi'an on new arXiv rendering
	David Firth on new arXiv rendering
	Coin Flipping Conund… on joint fiddlin
	Art Owen on Jerome Spanier (1930-2024)

Xi'an's Og

Archive for Potts model

approximate likelihood perspective on ABC

Monte Carlo methods for Potts models

scalable Bayesian inference for the inverse temperature of a hidden Potts model

Pre-processing for approximate Bayesian computation in image analysis

blogs & links

Recent entries

Latest comments

Og\’s RSS

Xi'an's Og

Archive for Potts model

human data

Share:

approximate likelihood perspective on ABC

Share:

Monte Carlo methods for Potts models

Share:

scalable Bayesian inference for the inverse temperature of a hidden Potts model

Share:

Pre-processing for approximate Bayesian computation in image analysis

Share:

blogs & links

Recent entries

Latest comments

Og\’s RSS