Archive for flight

gone South [jatp]

Posted in Mountains, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , on March 27, 2021 by xi'an

flow contrastive estimation

Posted in Books, Statistics with tags , , , , , , , , on March 15, 2021 by xi'an

On the flight back from Montpellier, last week, I read a 2019 paper by Gao et al. revisiting the MLE estimation of a parametric family parameter when the normalising constant Z=Z(θ) is unknown. Via noise-contrastive estimation à la Guttman & Hyvärinnen (or à la Charlie Geyer). Treating the normalising constant Z as an extra parameter (as in Kong et al.) and the classification probability as an objective function and calling it a likelihood, which it is not in my opinion as (i) the allocation to the groups is not random and (ii) the original density of the actual observations does not appear in the so-called likelihood.

“When q appears on the right of KL-divergence [against p],  it is forced to cover most of the modes of p, When q appears on the left of KL-divergence, it tends to chase the major modes of p while ignoring the minor modes.”

The flow in the title indicates that the contrastive distribution q is estimated by a flow-based estimator, namely the transform of a basic noise distribution via easily invertible and differentiable transforms, for instance with lower triangular Jacobians. This flow is also estimated directly from the data but the authors complain this estimation is not good enough for noise contrastive estimation and suggest instead resorting to a GAN version where the classification log-probability is maximised in the model parameters and minimsed in the flow parameters. Except that I feel it misses the true likelihood part. In other words, why on Hyperion would estimating all θ, Z=Z(θ), and α at once improve the estimation of Z?

The other aspect that puzzles me is that (12) uses integrated classification probabilities (with the unknown Z as extra parameter), rather than conditioning on the data, Bayes-like. (The difference between (12) and GAN is that here the discriminator function is constrained.) Esp. when the first expectation is replaced with its empirical version.

a year ago, a world away

Posted in Statistics with tags , , , , , , , , , , , , on February 24, 2021 by xi'an

Le Monde puzzle [#1083]

Posted in Books, Kids, R, Travel with tags , , , , , , on February 7, 2019 by xi'an

A Le Monde mathematical puzzle that seems hard to solve without the backup of a computer (and just simple enough to code on a flight to Montpellier):

Given the number N=2,019, find a decomposition of N as a sum of non-trivial powers of integers such that (a) the number of integers in the sum is maximal or (b) all powers are equal to 4.  Is it possible to write N as a sum of two powers?

It is straightforward to identify all possible terms in these sums by listing all powers of integers less than N

pool=(1:trunc(sqrt(2019)))^2
for (pow in 3:11)
  pool=unique(c(pool,(2:trunc(2019^(1/pow)))^pow))

which leads to 57 distinct powers. Sampling at random from this collection at random produces a sum of 21 perfect powers:

 1+4+8+9+16+25+27+32+36+49+64+81+100+121+125+128+144+169+196+243+441

But looking at the 22 smallest numbers in the pool of powers leads to 2019, which is a sure answer. Restricting the terms to powers of 4 leads to the sequence

1⁴+2⁴+3⁴+5⁴+6⁴ = 2019

And starting from the pools of all possible powers in a decomposition of 2019 as the sum of two powers shows this is impossible.

end of the Canadian Rockies [jatp]

Posted in Mountains, pictures, Travel with tags , , , , , , , , , on September 16, 2018 by xi'an