Archive for cross validated

Bernoulli mixtures

Posted in pictures, Statistics, University life with tags , , , , , , , on October 30, 2019 by xi'an

An interesting query on (or from) X validated: given a Bernoulli mixture where the weights are known and the probabilities are jointly drawn from a Dirichlet, what is the most efficient from running a Gibbs sampler including the latent variables to running a basic Metropolis-Hastings algorithm based on the mixture representation to running a collapsed Gibbs sampler that only samples the indicator variables… I provided a closed form expression for the collapsed target, but believe that the most efficient solution is based on the mixture representation!

stack explode

Posted in Books, Kids, University life with tags , , , , , , , , on October 21, 2019 by xi'an

To say the least, most Stack Exchange communities have been quite active in the past days, not towards solving an unusual flow of questions from new or old users, but in protesting against the exclusion of a moderator who disputed on a moderator forum the relevance of a code of conduct change proposed or imposed by the private company behind Stack Exchange, now called Stack Overflow (like the honomym forum on Stack Exchange). A change about the use of gender pronouns in comments and answers (an announcement that attracted the second largest number of negative votes for the entire site). And an exclusion followed by a sequence of apologies from the company highest officers that did not seem to pacify anyone (first largest number of negative votes!) and that kept the excluded moderator excluded. And leading to close to one hundred moderators resigning or going AWOL. Including one of the most active members of X validated, Glen_b. Who posted a detailed description of the chain of events and a most rational explanation of why he was resigning from being a moderator. And then another major moderator, gung… A flak overflow as put by another report.

“We recognise that Stack Exchange is in no way obliged to take our input. We know that we are guests in the home of a private company. We don’t own the platform, and while we want to help to steer the ship, we don’t have the right to determine how it is governed. What built this network is a sense of community and common purpose, and a big part of that has always been the close relationship and communication between Stack Exchange and stakeholders, such as moderators and users. It’s a shame that we’ve lost something so fundamental.” dearstackexchange.com

What can be learned from this fiasco is that it is not a very good idea to let a technical Q&A forum such as Stack Exchange to be run by a private company. Even though many contributors may have never realised till now this is the case. And even when the company is using A/B tests, Bayesian GLMs, and Stan to decrease the number of “unfriendly comments” on the site. Companies are primarily there to make profit and report to stakeholders, rather than the millions of people contributing to the site for free, sometimes investing a considerable amount of time and energy towards making the questions answered in a constructive manner that benefits the entire audience. Despite the facade coolness as in the nerdy, geeky chatter on the company blog, the company  executives and employees obviously do not share the same goal as the volunteers in the numerous communities of the network. Dealing in public relations rather than sheer exchange and in public image rather than openness and in management rather than empowerment. And in advertising rather than sharing.

Another basic remark is that by growing into so many subjects beyond computer programming, and in particular non-technical topics, the SE platform has hit a stage where some communities goals will inevitably clash with others’. I deem it rather characteristic that the (one?) source of the crisis is the issue of using pronouns as stated by the OP (if any) or else using ungendered pronouns. (Pronouns like they which apparently works in English for both plural and singular—as does you—as early as the 14th century.) As some raised religious arguments against using one or several versions. As well as grammatical ones and further ones of being challenging for some non-native-English speakers. I do not think that a corporate imposition (with threats of exclusionary consequences) one single version of inclusion and tolerance is going to work and especially not within each and all of the communities constituting Stack Exchange, which is why working towards an alternative and decentralised network could be timely.

open reviews

Posted in Statistics with tags , , , , , , on September 13, 2019 by xi'an

When looking at a question on X validated, on the expected Metropolis-Hastings ratio being one (not all the time!), I was somewhat bemused at the OP linking to an anonymised paper under review for ICLR, as I thought this was breaching standard confidentiality rules for reviews. Digging a wee bit deeper, I realised this was a paper from the previous ICLR conference, already published both on arXiv and in the 2018 conference proceedings, and that ICLR was actually resorting to an open review policy where both papers and reviews were available and even better where anyone could comment on the paper while it was under review. And after. Which I think is a great idea, the worst possible situation being a poor paper remaining un-discussed. While I am not a big fan of the brutalist approach of many machine-learning conferences, where the restrictive format of both submissions and reviews is essentially preventing in-depth reviews, this feature should be added to statistics journal webpages (until PCIs become the norm).

my likelihood is dominating my prior [not!]

Posted in Kids, Statistics with tags , , , , , on August 29, 2019 by xi'an

An interesting misconception read on X validated today, with a confusion between the absolute value of the likelihood function and its variability. Which I have trouble explaining except possibly by the extrapolation from the discrete case and a confusion between the probability density of the data [scaled as a probability] and the likelihood function [scale-less]. I also had trouble convincing the originator of the question of the irrelevance of the scale of the likelihood per se, even when demonstrating that |𝚺| could vanish from the posterior with no consequence whatsoever. It is only when I thought of the case when the likelihood is constant in 𝜃 that I managed to make my case.

a problem that did not need ABC in the end

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , on August 8, 2019 by xi'an

While in Denver, at JSM, I came across [across validated!] this primarily challenging problem of finding the posterior of the 10³ long probability vector of a Multinomial M(10⁶,p) when only observing the range of a realisation of M(10⁶,p). This sounded challenging because the distribution of the pair (min,max) is not available in closed form. (Although this allowed me to find a paper on the topic by the late Shanti Gupta, who was chair at Purdue University when I visited 32 years ago…) This seemed to call for ABC (especially since I was about to give an introductory lecture on the topic!, law of the hammer…), but the simulation of datasets compatible with the extreme values of both minimum and maximum, m=80 and M=12000, proved difficult when using a uniform Dirichlet prior on the probability vector, since these extremes called for both small and large values of the probabilities. However, I later realised that the problem could be brought down to a Multinomial with only three categories and the observation (m,M,n-m-M), leading to an obvious Dirichlet posterior and a predictive for the remaining 10³-2 realisations.

Gibbs sampling with incompatible conditionals

Posted in Books, Kids, R, Statistics with tags , , , , , , on July 23, 2019 by xi'an

An interesting question (with no clear motivation) on X validated wondering why a Gibbs sampler produces NAs… Interesting because multi-layered:

  1. The attached R code indeed produces NAs because it calls the Negative Binomial Neg(x¹,p) random generator with a zero success parameter, x¹=0, which automatically returns NAs. This can be escaped by returning a one (1) instead.
  2. The Gibbs sampler is based on a Bin(x²,p) conditional for X¹ and a Neg(x¹,p) conditional for X². When using the most standard version of the Negative Binomial random variate as the number of failures, hence supported on 0,1,2…. these two conditionals are incompatible, i.e., there cannot be a joint distribution behind that returns these as conditionals, which makes the limiting behaviour of the Markov chain harder to study. It however seems to converge to a distribution close to zero, which is not contradictory with the incompatibility property: the stationary joint distribution simply does not enjoy the conditionals used by the Gibbs sampler as its conditionals.
  3. When using the less standard version of the Negative Binomial random variate understood as a number of attempts for the conditional on X², the two conditionals are compatible and correspond to a joint measure proportional to x_1^{-1} {x_1 \choose x_2} p^{x_2} (1-p)^{x_1-x_2}, however this pmf does not sum up to a finite quantity (as in the original Gibbs for Kids example!), hence the resulting Markov chain is at best null recurrent, which seems to be the case for p different from ½. This is unclear to me for p=½.

truncated Normal moments

Posted in Books, Kids, Statistics with tags , , , , , on May 24, 2019 by xi'an

An interesting if presumably hopeless question spotted on X validated: a lower-truncated Normal distribution is parameterised by its location, scale, and truncation values, μ, σ, and α. There exist formulas to derive the mean and variance of the resulting distribution,  that is, when α=0,

\Bbb{E}_{\mu,\sigma}[X]= \mu + \frac{\varphi(\mu/\sigma)}{1-\Phi(-\mu/\sigma)}\sigma

and

\text{var}_{\mu,\sigma}(X)=\sigma^2\left[1-\frac{\mu\varphi(\mu/\sigma)/\sigma}{1-\Phi(-\mu/\sigma)}  -\left(\frac{\varphi(\mu/\sigma)}{1-\Phi(-\mu/\sigma)}\right)^2\right]

but there is no easy way to choose (μ, σ) from these two quantities. Beyond numerical resolution of both equations. One of the issues is that ( μ, σ) is not a location-scale parameter for the truncated Normal distribution when α is fixed.