Filed under: Books, pictures, Travel, University life, Wines Tagged: France, La Comédie, Montpellier, sunset, theatre ]]>

- Bayesian Survival Model Based on Moment Characterization by Arbel, Julyan et al.
- A New Finite Approximation for the NGG Mixture Model: An Application to Density Estimation by Bianchini, Ilaria
- Distributed Estimation of Mixture Model by Dedecius, Kamil et al.
- Jeffreys’ Priors for Mixture Estimation by Grazian, Clara and X
- A Subordinated Stochastic Process Model by Palacios, Ana Paula et al.
- Bayesian Variable Selection for Generalized Linear Models Using the Power-Conditional-Expected-Posterior Prior by Perrakis, Konstantinos et al.
- Application of Interweaving in DLMs to an Exchange and Specialization Experiment by Simpson, Matthew
- On Bayesian Based Adaptive Confidence Sets for Linear Functionals by Szabó, Botond
- Identifying the Infectious Period Distribution for Stochastic Epidemic Models Using the Posterior Predictive Check by Alharthi, Muteb et al.
- A New Strategy for Testing Cosmology with Simulations by Killedar, Madhura et al.
- Formal and Heuristic Model Averaging Methods for Predicting the US Unemployment Rate by Kolly, Jeremy
- Bayesian Estimation of the Aortic Stiffness based on Non-invasive Computed Tomography Images by Lanzarone, Ettore et al.
- Bayesian Filtering for Thermal Conductivity Estimation Given Temperature Observations by Martín-Fernández, Laura et al.
- A Mixture Model for Filtering Firms’ Profit Rates by Scharfenaker, Ellis et al.

Enjoy!

Filed under: Books, Kids, pictures, Statistics, Travel, University life, Wines Tagged: Austria, BAYSM 2014, conference, proceedings, Springer-Verlag, Vienna, Wien, WU Wirtschaftsuniversität Wien, young Bayesians ]]>

Filed under: pictures, Travel Tagged: adds, Boston, Clarex, Massachusset, métro ]]>

The question is essentially wondering at how to simulate from a distribution defined by its failure rate function, which is connected with the density f of the distribution by

From a purely probabilistic perspective, defining the distribution through f or through η is equivalent, as shown by the relation

but, from a simulation point of view, it may provide a different entry. Indeed, all that is needed is the ability to solve (in X) the equation

when U is a Uniform (0,1) variable. Which may help in that it does not require a derivation of f. Obviously, this also begs the question as to why would a distribution be defined by its failure rate function.

Filed under: Books, Kids, Statistics, University life Tagged: cross validated, failure rate, Monte Carlo Statistical Methods, probability theory, reliability, simulation, StackExchange, stackoverflow, survival analysis ]]>

“The reasons cited for this laggardly response[to innovations]will be familiar to any observer of the university system: an inherently conservative and risk-averse culture in most institutions; sclerotic systems and processes designed for a different world, and a lack of capacity, skills and willingness to change among an ageing academic community. All these are reinforced by perceptions that most proposed innovations are over-hyped and that current ways of operating have plenty of life left in them yet.”

Filed under: Books, Kids, pictures, University life Tagged: marketing, privatisation, reform, The Guardian, United Kingdom ]]>

**Adaptation and ergodicity.**

We certainly agree that the naive approach of using a non-parametric kernel density estimator on the chain history (as in [Christian’s book, Example 8.8]) as a *proposal* fails spectacularly on simple examples: the probability of proposing in unexplored regions is extremely small, independent of the current position of the MCMC trajectory. This is not what we do though. Instead, we use the gradient of a density estimator, and not the density itself, for our HMC proposal. Just like KAMH, KMC lite in fact falls back to Random Walk Metropolis in previously unexplored regions and therefore inherits geometric ergodicity properties. This in particular includes the ability to explore previously “unseen” regions, even if adaptation has stopped. I implemented a simple illustration and comparison here.

**ABC example.**

The main point of the ABC example, is that our method does not suffer from the additional bias from Gaussian synthetic likelihoods when being confronted with skewed models. But there is also a computational efficiency aspect. The scheme by Meeds et al. relies on finite differences and requires $2D$ simulations from the likelihood *every time* the gradient is evaluated (i.e. every leapfrog iteration) and H-ABC discards this valuable information subsequently. In contrast, KMC accumulates gradient information from simulations: it only requires to simulate from the likelihood *once* in the accept/reject step after the leapfrog integration (where gradients are available in closed form). The density is only updated then, and not during the leapfrog integration. Similar work on speeding up HMC via energy surrogates can be applied in the tall data scenario.

**Monte Carlo gradients.**

Approximating HMC when gradients aren’t available is in general a difficult problem. One approach (like surrogate models) may work well in some scenarios while a different approach (i.e. Monte Carlo) may work better in others, and the ABC example showcases such a case. We very much doubt that one size will fit all — but rather claim that it is of interest to find and document these scenarios.

Michael raised the concern that intractable gradients in the Pseudo-Marginal case can be avoided by running an MCMC chain on the joint space (e.g. $(f,\theta)$ for the GP classifier). To us, however, the situation is not that clear. In many cases, the correlations between variables can cause convergence problems (see e.g. here) for the MCMC and have to be addressed by de-correlation schemes (as here), or e.g. by incorporating geometric information, which also needs fixes as Michaels’s very own one. Which is the method of choice with a particular statistical problem at hand? Which method gives the smallest estimation error (if that is the goal?) for a given problem? Estimation error per time? A thorough comparison of these different classes of algorithms in terms of performance related to problem class would help here. Most papers (including ours) only show experiments favouring their own method.

**GP estimator quality.**

Finally, to address Michael’s point on the consistency of the GP estimator of the density gradient: this is discussed In the original paper on the infinite dimensional exponential family. As Michael points out, higher dimensional problems are unavoidably harder, however the specific details are rather involved. First, in terms of theory: both the well-specified case (when the natural parameter is in the RKHS, Section 4), and the ill-specified case (the natural parameter is in a “reasonable”, larger class of functions, Section 5), the estimate is consistent. Consistency is obtained in various metrics, including the L² error on gradients. The rates depend on how smooth the natural parameter is (and indeed a poor choice of hyper-parameter will mean slower convergence). The key point, in regards to Michael’s question, is that the smoothness requirement becomes more restrictive as the dimension increases: see Section 4.2, “range space assumption”.

Second, in terms of practice: we have found in experiments that the infinite dimensional exponential family does perform considerably better than a kernel density estimator when the dimension increases (Section 6). In other words, our density estimator can take advantage of smoothness properties of the “true” target density to get good convergence rates. As a practical strategy for hyper-parameter choice, we cross-validate, which works well empirically despite being distasteful to Bayesians. Experiments in the KMC paper also indicate that we can scale these estimators up to dimensions in the 100s on Laptop computers (unlike most other gradient estimation techniques in HMC, e.g. the ones in your HMC & sub-sampling note, or the finite differences in Meeds et al).

Filed under: Books, Statistics, University life Tagged: adaptive MCMC methods, Bayesian quadrature, Gatsby, Hamiltonian Monte Carlo, London, Markov chain, Monte Carlo Statistical Methods, non-parametric kernel estimation, reproducing kernel Hilbert space, RKHS, smoothness ]]>

Filed under: Kids, pictures, R, Statistics, University life Tagged: cex, pch, plot, R ]]>

*“Unfortunately, the factorization does not make it immediately clear how to aggregate on the level of samples without first having to obtain an estimate of the densities themselves.” (p.2)*

**T**he recently arXived variational consensus Monte Carlo is a paper by Maxim Rabinovich, Elaine Angelino, and Michael Jordan that approaches the consensus Monte Carlo principle from a variational perspective. As in the embarrassingly parallel version, the target is split into a product of K terms, each being interpreted as an unnormalised density and being fed to a different parallel processor. The most natural partition is to break the data into K subsamples and to raise the prior to the power 1/K in each term. While this decomposition makes sense from a storage perspective, since each bit corresponds to a different subsample of the data, it raises the question of the statistical pertinence of splitting the prior and my feelings about it are now more lukewarm than when I commented on the embarrassingly parallel version, mainly for the reason that it is not reparameterisation invariant—getting different targets if one does the reparameterisation before or after the partition—and hence does not treat the prior as the reference measure it should be. I therefore prefer the version where the same original prior is attached to each part of the partitioned likelihood (and even more the random subsampling approaches discussed in the recent paper of Bardenet, Doucet, and Holmes). Another difficulty with the decomposition is that a product of densities is *not* a density in most cases (it may even be of infinite mass) and does not offer a natural path to the analysis of samples generated from each term in the product. Nor an explanation as to why those samples should be relevant to construct a sample for the original target.

“The performance of our algorithm depends critically on the choice of aggregation function family.” (p.5)

Since the variational Bayes approach is a common answer to complex products models, Rabinovich et al. explore the use of variational Bayes techniques to build the consensus distribution out of the separate samples. As in Scott et al., and Neiswanger et al., the simulation from the consensus distribution is a transform of simulations from each of the terms in the product, e.g., a weighted average. Which determines the consensus distribution as a member of an aggregation family defined loosely by a Dirac mass. When the transform is a sum of individual terms, variational Bayes solutions get much easier to find and the authors work under this restriction… In the empirical evaluation of this variational Bayes approach as opposed to the uniform and Gaussian averaging options in Scott et al., it improves upon those, except in a mixture example with a large enough common variance.

*In fine*, despite the relevance of variational Bayes to improve the consensus approximation, I still remain unconvinced about the use of the product of (pseudo-)densities and the subsequent mix of simulations from those components, for the reason mentioned above and also because the tail behaviour of those components is not related with the tail behaviour of the target. Still, this is a working solution to a real problem and as such is a reference for future works.

Filed under: Books, Statistics, University life Tagged: big data, consensus Monte Carlo, embarassingly parallel, large data problems, subsampling, tall data, variational Bayes methods ]]>

Filed under: Books, Kids, pictures, Running, Statistics, Travel, University life Tagged: Amsterdam, Bayes factor, boat, Harold Jeffreys, Holland, Journal of Mathematical Psychology, psychometrics, sunrise, Theory of Probability, XXX ]]>

The disparities between the heart of Paris and some suburbs are numerous and massive, actually the more one gets away from the lifeline represented by the RER A and RER B train lines, so far from me the idea of negating this opposition, but the presentation made during those 10 minutes of Périphéries was quite approximative in statistical terms. For instance, the mortality rate in La Plaine is 30% higher than the mortality rate in Luxembourg and this was translated into the chances for a given individual from La Plaine to die in the coming year are 30% higher than if he [or she] lives in Luxembourg. Then a few minutes later the chances for a given individual from Luxembourg to die are 30% lower than he [or she] lives in La Plaine…. Reading from the above map, it appears that the reference is the mortality rate for the Greater Paris. (Those are 2010 figures.) This opposition that Vigneron attributes to a different access to health facilities, like the number of medical general practitioners per inhabitant, does not account for the huge socio-demographic differences between both places, for instance the much younger and maybe larger population in suburbs like La Plaine. And for other confounding factors: see, e.g., the equally large difference between the neighbouring stations of Luxembourg and Saint-Michel. There is no socio-demographic difference and the accessibility of health services is about the same. Or the similar opposition between the southern suburban stops of Bagneux and [my local] Bourg-la-Reine, with the same access to health services… Or yet again the massive decrease in the Yvette valley near Orsay. The analysis is thus statistically poor and somewhat ideologically biased in that I am unsure the data discussed during this radio show tells us much more than the sad fact that suburbs with less favoured populations show a higher mortality rate.

Filed under: Statistics, Travel Tagged: Bagneux, boulevard périphérique, Bourg-la-Rein, France, France Inter, inequalities, Luxembourg, national public radio, Orsay, Paris, Paris suburbs, Périphéries, RER B, Saint-Michel, Stade de France, Yvette ]]>