Archive for simulated annealing

simulated annealing and logistic regression to the max

Posted in Kids, R, Statistics with tags , , , , , , on April 26, 2023 by xi'an

A Riddler puzzle on the three binary and sequential questions one should ask three players hiding their respective U(0,1) realisation, U, V, and W, to best guess which player holds the largest number, max{U,V,W}. Assuming questions of the type Is U<b¹, &tc., the challenge boils down to selecting seven bounds (one for U, two for V, and four for W) in order to optimise the probability of picking the right player. Which means for a given collection of such bounds to learn this probability from the three binary answers. These can be turned into eight (2³) binary variables and I used them as entries in a logistic regression to predict that W was larger than max(U,V), itself predicted by the first two answers. The optimisation of the bounds can then be achieved by simulated annealing (or otherwise) and the approach returns (random) outputs like the following bounds (one on U, two on V, and four on W)

0.616 0.434 0.830 0.350 0.736 0.913 0.796 0.827

for an estimated probability of 0.827. This is a somewhat coherent sequence of bounds when considering the simpler case of two players. Indeed, with three bounds, the probability of winning can be readily derived as

b^2(b^1-b^2)+1/2+b^3(1-b^2+b^1)-b^1b^1

logically optimised by (b¹,b²,b³)=(1/2,1/4,3/4) for a success probability of 0.875. And it coïncides with the solution posted by The Riddler, although there is no intuition behind the figures, contrary to the two player situation. In fact, I am surprised that the bound on W does not equate the expectation of max{U,V} under the current conditions:

> x=runif(1e6,0,.616);y=runif(1e6,0,.434)
> mean(y+(x-y>0)*(x-y))
[1] 0.3591648

BayesComp²³ [aka MCMski⁶]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on March 20, 2023 by xi'an

The main BayesComp meeting started right after the ABC workshop and went on at a grueling pace, and offered a constant conundrum as to which of the four sessions to attend, the more when trying to enjoy some outdoor activity during the lunch breaks. My overall feeling is that it went on too fast, too quickly! Here are some quick and haphazard notes from some of the talks I attended, as for instance the practical parallelisation of an SMC algorithm by Adrien Corenflos, the advances made by Giacommo Zanella on using Bayesian asymptotics to assess robustness of Gibbs samplers to the dimension of the data (although with no assessment of the ensuing time requirements), a nice session on simulated annealing, from black holes to Alps (if the wrong mountain chain for Levi), and the central role of contrastive learning à la Geyer (1994) in the GAN talks of Veronika Rockova and Éric Moulines. Victor  Elvira delivered an enthusiastic talk on our massively recycled importance on-going project that we need to complete asap!

While their earlier arXived paper was on my reading list, I was quite excited by Nicolas Chopin’s (along with Mathieu Gerber) work on some quadrature stabilisation that is not QMC (but not too far either), with stratification over the unit cube (after a possible reparameterisation) requiring more evaluations, plus a sort of pulled-by-its-own-bootstrap control variate, but beating regular Monte Carlo in terms of convergence rate and practical precision (if accepting a large simulation budget from the start). A difficulty common to all (?) stratification proposals is that it does not readily applies to highly concentrated functions.

I chaired the lightning talks session, which were 3mn one-slide snapshots about some incoming posters selected by the scientific committee. While I appreciated the entry into the poster session, the more because it was quite crowded and busy, if full of interesting results, and enjoyed the slide solely made of “0.234”, I regret that not all poster presenters were not given the same opportunity (although I am unclear about which format would have permitted this) and that it did not attract more attendees as it took place in parallel with other sessions.

In a not-solely-ABC session, I appreciated Sirio Legramanti speaking on comparing different distance measures via Rademacher complexity, highlighting that some distances are not robust, incl. for instance some (all?) Wasserstein distances that are not defined for heavy tailed distributions like the Cauchy distribution. And using the mean as a summary statistic in such heavy tail settings comes as an issue, since the distance between simulated and observed means does not decrease in variance with the sample size, with the practical difficulty that the problem is hard to detect on real (misspecified) data since the true distribution behing (if any) is unknown. Would that imply that only intrinsic distances like maximum mean discrepancy or Kolmogorov-Smirnov are the only reasonable choices in misspecified settings?! While, in the ABC session, Jeremiah went back to this role of distances for generalised Bayesian inference, replacing likelihood by scoring rule, and requirement for Monte Carlo approximation (but is approximating an approximation that a terrible thing?!). I also discussed briefly with Alejandra Avalos on her use of pseudo-likelihoods in Ising models, which, while not the original model, is nonetheless a model and therefore to taken as such rather than as approximation.

I also enjoyed Gregor Kastner’s work on Bayesian prediction for a city (Milano) planning agent-based model relying on cell phone activities, which reminded me at a superficial level of a similar exploitation of cell usage in an attraction park in Singapore Steve Fienberg told me about during his last sabbatical in Paris.

In conclusion, an exciting meeting that should have stretched a whole week (or taken place in a less congenial environment!). The call for organising BayesComp 2025 is still open, by the way.

 

diffusions, sampling, and transport

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 21, 2022 by xi'an

The third and final day of the workshop was shortened for me as I had to catch an early flight back to Paris (and as I got overly conservative in my estimation for returning to JFK, catching a train with no delay at Penn Station and thus finding myself with two hours free before boarding, hence reviewing remaining Biometrika submission at the airport while waiting). As a result I missed the afternoon talks.

The morning was mostly about using scores for simulation (a topic of which I was mostly unaware), with Yang Song giving the introductory lecture on creating better [cf pix left] generative models via the score function, with a massive production of his on the topic (but too many image simulations of dogs, cats, and celebrities!). Estimating directly the score is feasible via Fisher divergence or score matching à la Hyvärinen (with a return of Stein’s unbiased estimator of the risk!). And relying on estimated scores to simulate / generate by Langevin dynamics or other MCMC methods that do not require density evaluations. Due to poor performances in low density / learning regions a fix is randomization / tempering but the resolution (as exposed) sounded clumsy. (And made me wonder at using some more advanced form of deconvolution since the randomization pattern is controlled.) The talk showed some impressive text to image simulations used by an animation studio!


And then my friend Arnaud Doucet continued on the same theme, motivating by estimating normalising constant through annealed importance sampling [Yuling’s meta-perspective comes back to mind in that the geometric mixture is not the only choice, but with which objective]. In AIS, as in a series of Arnaud’s works, like the 2006 SMC Read Paper with Pierre Del Moral and Ajay Jasra, the importance (!) of some auxiliary backward kernels goes beyond theoretical arguments, with the ideally sequence being provided by a Langevin diffusion. Hence involving a score, learned as in the previous talk. Arnaud reformulated this issue as creating a transportation map and its reverse, which is leading to their recent Schrödinger bridge generative model. Which [imho] both brings a unification perspective to his work and an efficient way to bridge prior to posterior in AIS. A most profitable morn for me!

Overall, this was an exhilarating workshop, full of discoveries for me and providing me with the opportunity to meet and exchange with mostly people I had not met before. Thanks to Bob Carpenter and Michael Albergo for organising and running the workshop!

optimal Gaussian zorbing

Posted in Books, Kids, R, Statistics with tags , , , , , , on August 30, 2022 by xi'an

A zorbing puzzle from the Riddler: cover the plane with four non-intersecting disks of radius one towards getting the highest probability (under the standard bivariate Normal distribution).

As I could not see a simple connection between the disks and the standard Normal, beyond the probability of a disk being given by a non-central chi-square cdf (with two degrees of freedom), I (once again) tried a random search by simulated annealing, which ended up with a configuration like the above, never above 0.777 using a pedestrian R code like

for(t in 1:1e6){# move the disk centres
 Ap=A+vemp*rnorm(2)
 Bp=B+vemp*rnorm(2)
 while(dist(rbind(Ap,Bp))<2)Bp=B+vemp*rnorm(2)
 Cp=C+vemp*rnorm(2)
 while(min(dist(rbind(Ap,Bp,Cp)))<2)Cp=C+vemp*rnorm(2)
 Dp=D+vemp*rnorm(2)
 while(min(dist(rbind(Ap,Bp,Cp,Dp)))<2)Dp=D+vemp*rnorm(2)
 #coverage probability
 pp=pchisq(1,df=2,ncp=Ap%*%Ap)+pchisq(1,df=2,ncp=Bp%*%Bp)+
    pchisq(1,df=2,ncp=Cp%*%Cp)+pchisq(1,df=2,ncp=Dp%*%Dp)
 #simulated annealing step
 if(log(runif(1))<(pp-p)/sqrt(temp)){
   A=Bp;B=Cp;C=Dp;D=Ap;p=pp
   if (sol$val<p) sol=list(val=pp,pos=rbind(A,B,C,D))}
 temp=temp*.9999}

I also tried a simpler configuration where all disk centres were equidistant from a reference centre, but this led to a lower “optimal” probability. I was looking forward the discussion of the puzzle, to discover if anything less brute-force was possible! But there was no deeper argument there beyond the elimination of other “natural” configurations (and missing the non-central χ² connection!). Among these options, having two disks tangent at (0,0) were optimal. But the illustration was much nicer:

not a Bernoulli factory

Posted in Books, Kids, pictures, R with tags , , , , , , , on May 20, 2020 by xi'an

A Riddler riddle I possibly misunderstood:

Four isolated persons are given four fair coins, which can be either flipped once or returned without being flipped. If all flipped coins come up heads, the team wins! Else, if any comes up tails, or if no flip at all is done, it looses. Each person is further given an independent U(0,1) realisation. What is the best strategy?

Since the players are separated, I would presume the same procedure is used by all. Meaning that a coin is tossed with probability p, ie if the uniform is less than p, and untouched otherwise. The probability of winning is then

4(1-p)³p½+6(1-p)³p½²+4(1-p)p³½³+p⁴½⁴

which is maximum for p=0.3420391, with a winning probability of 0.2848424.

And an extra puzzle for free:

solve xxxx=2020

Where the integral part is the integer immediately below x. Puzzle that I first fail solving by brute force, because I did not look at negative x’s… Since the fourth root of 2020 is between 6 and 7, the solution is either x=6+ε or x=-7+ε, with ε in (0,1). The puzzle then becomes either

(6+ε)⌊(6+ε)⌊(6+ε)⌊6+ε⌋⌋ = (6+ε)⌊(6+ε)⌊36+6ε⌋⌋ = (6+ε)⌊(6+ε)(36+⌊6ε⌋)⌋ = 2020

where there are 6 possible integer values for ⌊6ε⌋, with only ⌊6ε⌋=5 being possible, turning the equation into

(6+ε)⌊41(6+ε) = (6+ε)(246+⌊41ε) = 2020

where again only ⌊42ε=40 being possible, ending up with

1716+286ε = 2020

which has no solution in (0,1). In the second case

(-7+ε)(-7+ε)(-7+ε)-7+ε⌋⌋ = (-7+ε)(-7+ε)(49+-7ε⌋)= 2020

shows that only -7ε=-3 is possible, leading to

(-7+ε)⌊46(-7+ε)) = (-7+ε) (-322+46ε⌋)=2020

with only 46ε=17 possible, hence

2135-305ε=2020

and

ε=115/305.

A brute force simulated annealing resolution returns x=-6.622706 after 10⁸ iterations. A more interesting question is to figure out the discontinuity points of the function

ℵ(x) = xxxx

as they seem to be numerous:

For instance, only 854 of the first 2020 integers enjoy a solution to ℵ(x)=n.