Archive for the Books Category

Le Monde puzzle [#1013]

Posted in Books, Kids with tags , , , , , on June 23, 2017 by xi'an

A purely arithmetic Le Monde mathematical puzzle:

An operation þ applies to all pairs of natural integers with the properties

0 þ (a+1) = (0 þ a)+1, (a+1) þ (b+1)=(a þ b)+1, 271 þ 287 = 77777, 2018 þ 39 = 2018×39

Find the smallest integer d>287 such that there exists c<d leading to c þ d = c x d, the smallest integer f>2017 such that 2017 þ f = 2017×40. Is there any know integer f such that f þ 2017 = 40×2017?

The major appeal in this puzzle (where no R programming seems to help!) is that the “data” does not completely defines the operation  þ ! Indeed, when a<b, it is straightforward to deduce that a þ b = (0 þ 0)+b, hence solving the first two questions by deriving (0 þ 0)=270×287 [with d=2×287 and f=2017×40-270×287], but the opposed quantity b þ a is not defined, apart from (2018-39) þ 0. This however brings a resolution since

(2018-39) þ 0 = 2017×39 and (2018-39+2017) þ 2017 = 2017×39+2017 = 2017×40

leading to f=2018-39+2017=3996.

slice sampling for Dirichlet mixture process

Posted in Books, Statistics, University life with tags , , , , , , , on June 21, 2017 by xi'an

When working with my PhD student Changye in Dauphine this morning I realised that slice sampling also applies to discrete support distributions and could even be of use in such settings. That it works is (now) straightforward in that the missing variable representation behind the slice sampler also applies to densities defined with respect to a discrete measure. That this is useful transpires from the short paper of Stephen Walker (2007) where we saw this, as Stephen relies on the slice sampler to sample from the Dirichlet mixture model by eliminating the tail problem associated with this distribution. (This paper appeared in Communications in Statistics and it is through Pati & Dunson (2014) taking advantage of this trick that Changye found about its very existence. I may have known about it in an earlier life, but I had clearly forgotten everything!)

While the prior distribution (of the weights) of the Dirichlet mixture process is easy to generate via the stick breaking representation, the posterior distribution is trickier as the weights are multiplied by the values of the sampling distribution (likelihood) at the corresponding parameter values and they cannot be normalised. Introducing a uniform to replace all weights in the mixture with an indicator that the uniform is less than those weights corresponds to a (latent variable) completion [or a demarginalisation as we called this trick in Monte Carlo Statistical Methods]. As elaborated in the paper, the Gibbs steps corresponding to this completion are easy to implement, involving only a finite number of components. Meaning the allocation to a component of the mixture can be operated rather efficiently. Or not when considering that the weights in the Dirichlet mixture are not monotone, hence that a large number of them may need to be computed before picking the next index in the mixture when the uniform draw happens to be quite small.

Infomocracy [book review]

Posted in Books, Travel with tags , , , , , , , , , on June 17, 2017 by xi'an

Infomocracy is a novel by Malka Older set in a near future where most of the Earth is operating under a common elective system where each geographical unit of 100,000 people elect a local representative that runs this unit according to the party’s program and contributes to elect a Worldwide government, except for some non-democratic islets like Saudi Arabia. The whole novel revolves around the incoming election, with different parties trying to influence the outcome in their favour, some to the point of instating a dictature. Which does not sound that different from present times!, with the sligth difference that the whole process is controlled by Information, a sort of World Wide Web that seems to operate neutrally above states and parties, although the book does not elaborate on how this could be possible. The story is told through four main (and somewhat charicaturesque) characters, working for or against the elections and crossing paths along the novel. Certainly worth reading if not outstanding. (And definitely not “one of the greatest literary debuts in recent history”!)

The book is more interesting as a dystopia on electoral systems and the way the information revolution can produce a step back in democracy, with the systematisation of fake news and voters’ manipulation, where the marketing research group YouGov has become a party, than as a science-fiction (or politics-fiction) book. Indeed, it tries too hard to replicate The cyberpunk reference, William Gibson’s Neuromancer, with the same construct of interlacing threads, the same fascination for Japan, airports, luxury hotels, if not for brands, and a similar ninja-geek pair of characters. And with very little invention about the technology of the 21st Century.  (And a missed opportunity to exploit artificial intelligence themes and the prediction of outcomes when Information builds a fake vote database but does not seem to mind about Benford’s Law.) The acknowledgement section somewhat explains this imbalance, in that the author worked many years in humanitarian organisations and is currently completing a thesis at Science Po’ (Paris).

Bayesian decision riddle

Posted in Books, Kids, Statistics with tags , , , , on June 15, 2017 by xi'an

The current puzzle on The Riddler is a version of the secretary problem with an interesting (?) Bayesian solution.

Given four positive numbers x¹, x², x³, x⁴, observed sequentially, the associated utility is the value of x at the stopping time. What is the optimal stopping rule?

While nothing is mentioned about the distribution of the x’s, I made the assumption that they were iid and uniformly distributed over (0,M), with M unknown and tried a Bayesian resolution with the non-informative prior π(M)=1/M. And failed. The reason for this failure is that the expected utility is infinite at the first step: while the posterior expected utility is finite with three and two observations, meaning I can compare stopping and continuing at the second and third steps, the predicted expected reward for continuing after observing x¹ does not exist because the expected value of max(x¹,x²) given x¹ does not exist. As the predictive density of x² is max(x¹,x²)⁻²…  Several alternatives are possible to bypass this impossible resolution, from changing the utility function to picking another reference prior.

For instance, using a prior like π(M)=1/M² l(and the same monetary return utility) leads to a proper optimal solution, namely

  1. always wait for the second observation x²
  2. stop at x² if x²>11x¹/12, else wait for x³
  3. stop at x³ if x³>23 max(x¹,x²)/24, else observe x⁴

obtained analytically on a bar table in Rouen (and checked numerically later).

Another approach is to try to optimise the probability to pick the largest amount of the four x’s, but this is not leading to an interesting solution, since it corresponds to picking the first maximum after x¹, while picking the largest among remaining ones leads to a somewhat convoluted solution I have no patience to produce here! Plus this is not a really pertinent loss function as it does not discriminate enough against waiting…

Le Monde puzzle [#1012]

Posted in Books, Kids with tags , , , , , on June 14, 2017 by xi'an

A basic geometric Le Monde mathematical puzzle:

Take a triangle ABC such that the side AB is c=42 long, each side has an integer length, and the area is 756. Given an inner point D, draw three lines parallel to the three sides of ABC through D in order to construct three triangles with common summit D and bases supported by these three sides.

  1. How far is D from the base AB when all three triangles have perimeters equal to the sides that support their basis?
  2. How far is D from the previous solution when the sum of the areas of the three triangles is minimal?

Since the puzzle is purely geometric, I was quite tempted to bypass it and to watch instead the British elections and the Comey audition! However, the sides a and b are easily found by an exhaustive search, a=39 and b=45 (or the reverse). From there, the problem resolution proceeds by a similar triangles argument, since all triangles constructed by the game rule have the same angles, hence proportional sides. For the first question, this leads to a straightforward determination of the basis of each triangle by the perimeter equation, meaning that D is then 12 units away from AB. The second question is not harder in that the surface of a triangle with basis a and opposite angles β and γ can be written as

a²sin(β)sin(γ)/2sin(β+γ)

meaning it suffices to minimise a²+a’²+a”² under the constraint that the sum of the three sides parallel to BC is the complete length of BC, a²+a’²+a”²=39. The solution is then that all triangles are identical, leading to a summit D’ at a distance 12 from AB, again!, but in the middle of the segment, hence distance to the earlier D equal to one.

thinning a Markov chain, statistically

Posted in Books, pictures, R, Statistics with tags , , , , , , on June 13, 2017 by xi'an

Art Owen has arXived a new version of his thinning MCMC paper, where he studies how thinning or subsampling can improve computing time in MCMC chains. I remember quite well the message set by Mark Berliner and Steve MacEachern in an early 1990’s paper that subsampling was always increasing the variance of the resulting estimators. We actually have this result in our Monte Carlo Statistical Methods book. Now, there are other perspectives on this, as for instance cases when thinning can be hard-wired by simulating directly a k-step move, delaying rejection or acceptance, prefetching, or simulating directly the accepted values as in our vanilla Rao-Blackwellisation approach. Here, Art considers the case when there is a cost θ of computing a transform of the simulation [when the transition cost a unit] and when those transforms are positively correlated with correlation ρ. Somewhat unsurprisingly, when θ is large enough, thinning becomes worth implementing. But requires extra computations in evaluating the correlation ρ and the cost θ, which is rarely comparable with the cost of computing the likelihood itself, a requirement for the Metropolis-Hastings or Hamiltonian Monte Carlo step(s).  Subsampling while keeping the right target (which is a hard constraint!) should thus have a much more effective impact on computing budgets.

ACDC versus ABC

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , on June 12, 2017 by xi'an

At the Bayes, Fiducial and Frequentist workshop last month, I discussed with the authors of this newly arXived paper, Approximate confidence distribution computing, Suzanne Thornton and Min-ge Xie. Which they abbreviate as ACC and not as ACDC. While I have discussed the notion of confidence distribution in some earlier posts, this paper aims at producing proper frequentist coverage within a likelihood-free setting. Given the proximity with our recent paper on the asymptotics of ABC, as well as with Li and Fearnhead (2016) parallel endeavour, it is difficult (for me) to spot the actual distinction between ACC and ABC given that we also achieve (asymptotically) proper coverage when the limiting ABC distribution is Gaussian, which is the case for a tolerance decreasing quickly enough to zero (in the sample size).

“Inference from the ABC posterior will always be difficult to justify within a Bayesian framework.”

Indeed the ACC setting is eerily similar to ABC apart from the potential of the generating distribution to be data dependent. (Which is fine when considering that the confidence distributions have no Bayesian motivation but are a tool to ensure proper frequentist coverage.) That it is “able to offer theoretical support for ABC” (p.5) is unclear to me, given both this data dependence and the constraints it imposes on the [sampling and algorithmic] setting. Similarly, I do not understand how the authors “are not committing the error of doubly using the data” (p.5) and why they should be concerned about it, standing outside the Bayesian framework. If the prior involves the data as in the Cauchy location example, it literally uses the data [once], followed by an ABC comparison between simulated and actual data, that uses the data [a second time].

“Rather than engaging in a pursuit to define a moving target such as [a range of posterior distributions], ACC maintains a consistently clear frequentist interpretation (…) and thereby offers a consistently cohesive interpretation of likelihood-free methods.”

The frequentist coverage guarantee comes from a bootstrap-like assumption that [with tolerance equal to zero] the distribution of the ABC/ACC/ACDC random parameter around an estimate of the parameter given the summary statistic is identical to the [frequentist] distribution of this estimate around the true parameter [given the true parameter, although this conditioning makes no sense outside a Bayesian framework]. (There must be a typo in the paper when the authors define [p.10] the estimator as minimising the derivative of the density of the summary statistic, while still calling it an MLE.) That this bootstrap-like assumption holds is established (in Theorem 1) under a CLT on this MLE and assumptions on the data-dependent proposal that connect it to the density of the summary statistic. Connection that seem to imply a data-dependence as well as a certain knowledge about this density. What I find most surprising in this derivation is the total absence of conditions or even discussion on the tolerance level which, as we have shown, is paramount to the validation or invalidation of ABC inference. It sounds like the authors of Approximate confidence distribution computing are setting ε equal to zero for those theoretical derivations. While in practice they apply rules [for choosing ε] they do not voice out, but which result in very different acceptance rates for the ACC version they oppose to an ABC version. (In all illustrations, it seems that ε=0.1, which does not make much sense.) All in all, I am thus rather skeptical about the practical implications of the paper in that it seems to achieve confidence guarantees by first assuming proper if implicit choices of summary statistics and parameter generating distribution.