Archive for measure theory

le bayésianisme aujourd’hui [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on March 4, 2017 by xi'an

It is quite rare to see a book published in French about Bayesian statistics and even rarer to find one that connects philosophy of science, foundations of probability, statistics, and applications in neurosciences and artificial intelligence. Le bayésianisme aujourd’hui (Bayesianism today) was edited by Isabelle Drouet, a Reader in Philosophy at La Sorbonne. And includes a chapter of mine on the basics of Bayesian inference (à la Bayesian Choice), written in French like the rest of the book.

The title of the book is rather surprising (to me) as I had never heard the term Bayesianism mentioned before. As shown by this link, the term apparently exists. (Even though I dislike the sound of it!) The notion is one of a probabilistic structure of knowledge and learning, à la Poincaré. As described in the beginning of the book. But I fear the arguments minimising the subjectivity of the Bayesian approach should not be advanced, following my new stance on the relativity of probabilistic statements, if only because they are defensive and open the path all too easily to counterarguments. Similarly, the argument according to which the “Big Data” era makesp the impact of the prior negligible and paradoxically justifies the use of Bayesian methods is limited to the case of little Big Data, i.e., when the observations are more or less iid with a limited number of parameters. Not when the number of parameters explodes. Another set of arguments that I find both more modern and compelling [for being modern is not necessarily a plus!] is the ease with which the Bayesian framework allows for integrative and cooperative learning. Along with its ultimate modularity, since each component of the learning mechanism can be extracted and replaced with an alternative. Continue reading

comments on reflections

Posted in pictures, Statistics, University life with tags , , , , , , on February 9, 2015 by xi'an

La Défense and Maison-Lafitte from my office, Université Paris-Dauphine, Nov. 05, 2011I just arXived my comments about A. Ronald Gallant’s “Reflections on the Probability Space Induced by Moment Conditions with Implications for Bayesian Inference”, capitalising on the three posts I wrote around the discussion talk I gave at the 6th French Econometrics conference last year. Nothing new there, except that I may get a response from Ron Gallant as this is submitted as a discussion of his related paper in Journal of Financial Econometrics. While my conclusion is rather negative, I find the issue of setting prior and model based on a limited amount of information of much interest, with obvious links with ABC, empirical likelihood and other approximation methods.

full Bayesian significance test

Posted in Books, Statistics with tags , , , , , , , , , , on December 18, 2014 by xi'an

Among the many comments (thanks!) I received when posting our Testing via mixture estimation paper came the suggestion to relate this approach to the notion of full Bayesian significance test (FBST) developed by (Julio, not Hal) Stern and Pereira, from São Paulo, Brazil. I thus had a look at this alternative and read the Bayesian Analysis paper they published in 2008, as well as a paper recently published in Logic Journal of IGPL. (I could not find what the IGPL stands for.) The central notion in these papers is the e-value, which provides the posterior probability that the posterior density is larger than the largest posterior density over the null set. This definition bothers me, first because the null set has a measure equal to zero under an absolutely continuous prior (BA, p.82). Hence the posterior density is defined in an arbitrary manner over the null set and the maximum is itself arbitrary. (An issue that invalidates my 1993 version of the Lindley-Jeffreys paradox!) And second because it considers the posterior probability of an event that does not exist a priori, being conditional on the data. This sounds in fact quite similar to Statistical Inference, Murray Aitkin’s (2009) book using a posterior distribution of the likelihood function. With the same drawback of using the data twice. And the other issues discussed in our commentary of the book. (As a side-much-on-the-side remark, the authors incidentally  forgot me when citing our 1992 Annals of Statistics paper about decision theory on accuracy estimators..!)

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [slides]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on December 4, 2014 by xi'an

defsunset2Here are the slides of my incoming discussion of Ron Gallant’s paper, tomorrow.

another instance of ABC?

Posted in Statistics with tags , , , , , on December 2, 2014 by xi'an

“These characteristics are (1) likelihood is not available; (2) prior information is available; (3) a portion of the prior information is expressed in terms of functionals of the model that cannot be converted into an analytic prior on model parameters; (4) the model can be simulated. Our approach depends on an assumption that (5) an adequate statistical model for the data are available.”

A 2009 JASA paper by Ron Gallant and Rob McCulloch, entitled “On the Determination of General Scientific Models With Application to Asset Pricing”, may have or may not have connection with ABC, to wit the above quote, but I have trouble checking whether or not this is the case.

The true (scientific) model parametrised by θ is replaced with a (statistical) substitute that is available in closed form. And parametrised by g(θ). [If you can get access to the paper, I’d welcome opinions about Assumption 1 therein which states that the intractable density is equal to a closed-form density.] And the latter is over-parametrised when compared with the scientific model. As in, e.g., a N(θ,θ²) scientific model versus a N(μ,σ²) statistical model. In addition, the prior information is only available on θ. However, this does not seem to matter that much since (a) the Bayesian analysis is operated on θ only and (b) the Metropolis approach adopted by the authors involves simulating a massive number of pseudo-observations, given the current value of the parameter θ and the scientific model, so that the transform g(θ) can be estimated by maximum likelihood over the statistical model. The paper suggests using a secondary Markov chain algorithm to find this MLE. Which is claimed to be a simulated annealing resolution (p.121) although I do not see the temperature decreasing. The pseudo-model is then used in a primary MCMC step.

Hence, not truly an ABC algorithm. In the same setting, ABC would use a simulated dataset the same size as the observed dataset, compute the MLEs for both and compare them. Faster if less accurate when Assumption 1 [that the statistical model holds for a restricted parametrisation] does not stand.

Another interesting aspect of the paper is about creating and using a prior distribution around the manifold η=g(θ). This clearly relates to my earlier query about simulating on measure zero sets. The paper does not bring a definitive answer, as it never simulates exactly on the manifold, but this constitutes another entry on this challenging problem…

reflections on the probability space induced by moment conditions with implications for Bayesian Inference [discussion]

Posted in Books, Statistics, University life with tags , , , , , , on December 1, 2014 by xi'an

[Following my earlier reflections on Ron Gallant’s paper, here is a more condensed set of questions towards my discussion of next Friday.]

“If one specifies a set of moment functions collected together into a vector m(x,θ) of dimension M, regards θ as random and asserts that some transformation Z(x,θ) has distribution ψ then what is required to use this information and then possibly a prior to make valid inference?” (p.4)

The central question in the paper is whether or not given a set of moment equations

\mathbb{E}[m(X_1,\ldots,X_n,\theta)]=0

(where both the Xi‘s and θ are random), one can derive a likelihood function and a prior distribution compatible with those. It sounds to me like a highly complex question since it implies the integral equation

\int_{\Theta\times\mathcal{X}^n} m(x_1,\ldots,x_n,\theta)\,\pi(\theta)f(x_1|\theta)\cdots f(x_n|\theta) \text{d}\theta\text{d}x_1\cdots\text{d}x_n=0

must have a solution for all n’s. A related question that was also remanent with fiducial distributions is how on Earth (or Middle Earth) the concept of a random theta could arise outside Bayesian analysis. And another one is how could the equations make sense outside the existence of the pair (prior,likelihood). A question that may exhibit my ignorance of structural models. But which may also relate to the inconsistency of Zellner’s (1996) Bayesian method of moments as exposed by Geisser and Seidenfeld (1999).

For instance, the paper starts (why?) with the Fisherian example of the t distribution of

Z(x,\theta) = \frac{\bar{x}_n-\theta}{s/\sqrt{n}}

which is truly is a t variable when θ is fixed at the true mean value. Now, if we assume that the joint distribution of the Xi‘s and θ is such that this projection is a t variable, is there any other case than the Dirac mass on θ? For all (large enough) sample sizes n? I cannot tell and the paper does not bring [me] an answer either.

When I look at the analysis made in the abstraction part of the paper, I am puzzled by the starting point (17), where

p(x|\theta) = \psi(Z(x,\theta))

since the lhs and rhs operate on different spaces. In Fisher’s example, x is an n-dimensional vector, while Z is unidimensional. If I apply blindly the formula on this example, the t density does not integrate against the Lebesgue measure in the n-dimension Euclidean space… If a change of measure allows for this representation, I do not see so much appeal in using this new measure and anyway wonder in which sense this defines a likelihood function, i.e. the product of n densities of the Xi‘s conditional on θ. To me this is the central issue, which remains unsolved by the paper.

MCMC on zero measure sets

Posted in R, Statistics with tags , , , , , , , on March 24, 2014 by xi'an

zeromesSimulating a bivariate normal under the constraint (or conditional to the fact) that x²-y²=1 (a non-linear zero measure curve in the 2-dimensional Euclidean space) is not that easy: if running a random walk along that curve (by running a random walk on y and deducing x as x²=y²+1 and accepting with a Metropolis-Hastings ratio based on the bivariate normal density), the outcome differs from the target predicted by a change of variable and the proper derivation of the conditional. The above graph resulting from the R code below illustrates the discrepancy!

targ=function(y){
  exp(-y^2)/(1.52*sqrt(1+y^2))}

T=10^5
Eps=3
ys=xs=rep(runif(1),T)
xs[1]=sqrt(1+ys[1]^2)
for (t in 2:T){
  propy=runif(1,-Eps,Eps)+ys[t-1]
  propx=sqrt(1+propy^2)
  ace=(runif(1)<(dnorm(propy)*dnorm(propx))/
               (dnorm(ys[t-1])*dnorm(xs[t-1])))
  if (ace){
     ys[t]=propy;xs[t]=propx
     }else{
       ys[t]=ys[t-1];xs[t]=xs[t-1]}}

If instead we add the proper Jacobian as in

  ace=(runif(1)<(dnorm(propy)*dnorm(propx)/propx)/
               (dnorm(ys[t-1])*dnorm(xs[t-1])/xs[t-1]))

the fit is there. My open question is how to make this derivation generic, i.e. without requiring the (dreaded) computation of the (dreadful) Jacobian.

zeromas