maximum likelihood on negative binomial

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 7, 2015 by xi'an

fishermen poles, Carnon harbour, France, June 13, 2012Estimating both parameters of a negative binomial distribution NB(N,p) by maximum likelihood sounds like an obvious exercise. But it is not because some samples lead to degenerate solutions, namely p=0 and N=∞… This occurs when the mean of the sample is larger than its empirical variance, s²>x̄, not an impossible instance: I discovered this when reading a Cross Validated question asking about the action to take in such a case. A first remark of interest is that this only happens when the negative binomial distribution is defined in terms of failures (since else the number of successes is bounded). A major difference I never realised till now, for estimating N is not a straightforward exercise. A second remark is that a negative binomial NB(N,p) is a Poisson compound of an LSD variate with parameter p, the Poisson being with parameter η=-N log(1-p). And the LSD being a power distribution pk/k rather than a psychedelic drug. Since this is not an easy framework, Adamidis (1999) introduces an extra auxiliary variable that is a truncated exponential on (0,1) with parameter -log(1-p). A very neat trick that removes the nasty normalising constant on the LSD variate.

“Convergence was achieved in all cases, even when the starting values were poor and this emphasizes the numerical stability of the EM algorithm.”

Adamis then constructs an EM algorithm on the completed set of auxiliary variables with a closed form update on both parameters. Unfortunately, the algorithm only works when s²>x̄. Otherwise, it gets stuck at the boundary p=0 and N=∞. I was hoping for a replica of the mixture case where local maxima are more interesting than the degenerate global maximum… (Of course, there is always the alternative of using a Bayesian noninformative approach.)

Tractable Fully Bayesian inference via convex optimization and optimal transport theory

Posted in Books, Statistics, University life with tags , , , , , , , , on October 6, 2015 by xi'an

IMG_0294“Recently, El Moselhy et al. proposed a method to construct a map that pushed forward the prior measure to the posterior measure, casting Bayesian inference as an optimal transport problem. Namely, the constructed map transforms a random variable distributed according to the prior into another random variable distributed according to the posterior. This approach is conceptually different from previous methods, including sampling and approximation methods.”

Yesterday, Kim et al. arXived a paper with the above title, linking transport theory with Bayesian inference. Rather strangely, they motivate the transport theory with Galton’s quincunx, when the apparatus is a discrete version of the inverse cdf transform… Of course, in higher dimensions, there is no longer a straightforward transform and the paper shows (or recalls) that there exists a unique solution with positive Jacobian for log-concave posteriors. For instance, log-concave priors and likelihoods. This solution remains however a virtual notion in practice and an approximation is constructed via a (finite) functional polynomial basis. And minimising an empirical version of the Kullback-Leibler distance.

I am somewhat uncertain as to how and why apply such a transform to simulations from the prior (which thus has to be proper). Producing simulations from the posterior certainly is a traditional way to approximate Bayesian inference and this is thus one approach to this simulation. However, the discussion of the advantage of this approach over, say, MCMC, is quite limited. There is no comparison with alternative simulation or non-simulation methods and the computing time for the transport function derivation. And on the impact of the dimension of the parameter space on the computing time. In connection with recent discussions on probabilistic numerics and super-optimal convergence rates, Given that it relies on simulations, I doubt optimal transport can do better than O(√n) rates. One side remark about deriving posterior credible regions from (HPD)  prior credible regions: there is no reason the resulting region is optimal in volume (HPD) given that the transform is non-linear.

Argentan half-marathon [1 24′ 46″ – 22/496 – V2: 2/116 – 17°C]

Posted in pictures, Running with tags , , , , , , on October 5, 2015 by xi'an

ADSC_0135A fairly good race in Argentan, despite (relatively) hot weather that saw the winning time increase by two minutes and a half. (Last year’s winner actually lost 3 and half minutes on the same track). Gaining 18 seconds over my time from last year was thus a significant achievement. This was my 17th—or 18th including one I ran on my own as it had been cancelled—Argentan half-marathon. The first half [of the half] was a bit unpleasant under the relentless sun, and each wee slope made me miss my 3:52 kilometre goal. But very few runners passed me for good after the 5th kilometre and reaching the forest part was a blessing, providing shade and cool. A single runner passed me there, although I had not slowed down, and I did not realise it was another runner in my V2 category as I could have tried to keep up. The last kilometres were indeed much smoother than in previous, thanks to a change in my training where I increased the number of long distance trainings. And presumable thanks to the previous week abroad when I trained twice a day. Anyway, I was still surprised to end up as the second V2, 15 seconds from both the first and third V2 runners. Which got me a nice Timex watch as a reward! And a pretty ugly cup… [Thanks again and again to the photographs of Normandiecourseapied for their free pictures!]


Banff snapshot [#3]

Posted in pictures, Travel with tags , , , , , , on October 4, 2015 by xi'an

moral [dis]order

Posted in Kids, Travel with tags , , , , , , on October 3, 2015 by xi'an

“For example, a religiously affiliated college that receives federal grants could fire a professor simply for being gay and still receive those grants. Or federal workers could refuse to process the tax returns of same-sex couples simply because of bigotry against their marriages. It doesn’t stop there. As critics of the bill quickly pointed out, the measure’s broad language — which also protects those who believe that “sexual relations are properly reserved to” heterosexual marriages alone — would permit discrimination against anyone who has sexual relations outside such a marriage. That would appear to include women who have children outside of marriage, a class generally protected by federal law.” The New York Time

An excerpt from this week New York Time Sunday Review editorial about what it qualifies as “a nasty bit of business congressional Republicans call the First Amendment Defense Act.” A bill which first line states to be intended to “prevent discriminatory treatment of any person on the basis of views held with respect to marriage” and which in essence would allow for discriminatory treatment of homosexual and unmarried couples not to be prosecuted. A fine example of Newspeak if any! (Maybe they could also borrow Orwell‘s notion of a Ministry of Love.) Another excerpt of the bill that similarly competes for Newspeak of the Year:

(5) Laws that protect the free exercise of religious beliefs and moral convictions about marriage will encourage private citizens and institutions to demonstrate tolerance for those beliefs and convictions and therefore contribute to a more respectful, diverse, and peaceful society.

This reminded me of a story I was recently told me about a friend of a friend who is currently employed by a Catholic school in Australia and is afraid of being fired if found being pregnant outside of marriage. Which kind of “freedom” is to be defended in such “tolerant” behaviours?!

Argentan, 30th and 17th and 7th edition(s)

Posted in Running, Travel with tags , , , , , , on October 2, 2015 by xi'an

When I started the ‘Og, in 2008, I was about to run the 23rd edition of the Argentan half-marathon… Seven years later, I am once again getting ready for the race, after a rather good training season, between the mountains of the North Cascade and the track of Malakoff. with the last week in England, Holland, and Canada having seen close to two trainings a day. (Borderline stress injury, maybe!) Weather does not look too bad this year, so we’ll see tomorrow how I fare against myself (and the other V2 runners, incidentally!).

importance sampling with multiple MCMC sequences

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 2, 2015 by xi'an

Vivek Roy, Aixian Tan and James Flegal arXived a new paper, Estimating standard errors for importance sampling estimators with multiple Markov chains, where they obtain a central limit theorem and hence standard error estimates when using several MCMC chains to simulate from a mixture distribution as an importance sampling function. Just before I boarded my plane from Amsterdam to Calgary, which gave me the opportunity to read it completely (along with half a dozen other papers, since it is a long flight!) I first thought it was connecting to our AMIS algorithm (on which convergence Vivek spent a few frustrating weeks when he visited me at the end of his PhD), because of the mixture structure. This is actually altogether different, in that a mixture is made of unnormalised complex enough densities, to act as an importance sampler, and that, due to this complexity, the components can only be simulated via separate MCMC algorithms. Behind this characterisation lurks the challenging problem of estimating multiple normalising constants. The paper adopts the resolution by reverse logistic regression advocated in Charlie Geyer’s famous 1994 unpublished technical report. Beside the technical difficulties in establishing a CLT in this convoluted setup, the notion of mixing importance sampling and different Markov chains is quite appealing, especially in the domain of “tall” data and of splitting the likelihood in several or even many bits, since the mixture contains most of the information provided by the true posterior and can be corrected by an importance sampling step. In this very setting, I also think more adaptive schemes could be found to determine (estimate?!) the optimal weights of the mixture components.


Get every new post delivered to your Inbox.

Join 919 other followers