Archive for optimisation

simulation as optimization [by kernel gradient descent]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , on April 13, 2024 by xi'an

Yesterday, which proved an unseasonal bright, warm, day, I biked (with a new wheel!) to the east of Paris—in the Gare de Lyon district where I lived for three years in the 1980’s—to attend a Mokaplan seminar at INRIA Paris, where Anna Korba (CREST, to which I am also affiliated) talked about sampling through optimization of discrepancies.
This proved a most formative hour as I had not seen this perspective earlier (or possibly had forgotten about it). Except through some of the talks at the Flatiron Institute on Transport, Diffusions, and Sampling last year. Incl. Marilou Gabrié’s and Arnaud Doucet’s.
The concept behind remains attractive to me, at least conceptually, since it consists in approximating the target distribution, known up to a constant (a setting I have always felt standard simulation techniques was not exploiting to the maximum) or through a sample (a setting less convincing since the sample from the target is already there), via a sequence of (particle approximated) distributions when using the discrepancy between the current distribution and the target or gradient thereof to move the particles. (With no randomness in the Kernel Stein Discrepancy Descent algorithm.)
Ana Korba spoke about practically running the algorithm, as well as about convexity properties and some convergence results (with mixed performances for the Stein kernel, as opposed to SVGD). I remain definitely curious about the method like the (ergodic) distribution of the endpoints, the actual gain against an MCMC sample when accounting for computing time, the improvement above the empirical distribution when using a sample from π and its ecdf as the substitute for π, and the meaning of an error estimation in this context.

“exponential convergence (of the KL) for the SVGD gradient flow does not hold whenever π has exponential tails and the derivatives of ∇ log π and k grow at most at a polynomial rate”

Galton and Watson voluntarily skipping some generations

Posted in Books, Kids, R with tags , , , , , on June 2, 2023 by xi'an

A riddle on a form of a Galton-Watson process, starting from a single unit, where no one dies but rather, at each of 100 generations, Dog either opts for a Uniform number υ of additional units or increments a counter γ by this number υ, its goal being to optimise γ. The solution proposed by the Riddler does not establish his solution’s is the optimal strategy and considers anyway average gains. Solution that consists in always producing more units until the antepenultimate hour (ie incrementing only at the 99th and 100th generations),  I tried instead various logical (?) rules and compared outputs by bRute foRce, resulting in higher maxima (over numerous repeated calls) for the alternative principle

   G=0;K=1 for(t in 1:9){ 

go forth and X [or the reverse]

Posted in Books, Kids with tags , , , , on February 8, 2023 by xi'an

The New Year Riddle is about optimisation: starting with a single machine, between delivering one unit per machine – hour and delivering one new machine per machine every six days, what is the maximal number of units produced over 100 days?

Comparing the amounts produced by k machines after 6log2(k) days used to multiply the machines showed that 2¹⁵ -1 additional machines were first produced, to generate 7864320 items over the remaining 10 days. Which did not really require an R implementation (although I checked that intermediate solutions where only some of the machines were producing new machines were sub-optimal).

master project?

Posted in Books, Kids, Statistics, University life with tags , , , , , , , on July 25, 2022 by xi'an

A potential master project for my students next year inspired by an X validated question: given a Gaussian mixture density

f(x)\propto\sum_{i=1}^m \omega_i \sigma^{-1}\,\exp\{-(x-\mu_i)^2/2\sigma^2\}

with m known, the weights summing up to one, and the (prior) information that all means are within (-C,C), derive the parameters of this mixture from a sufficiently large number of evaluations of f. Pay attention to the numerical issues associated with the resolution.  In a second stage, envision this problem from an exponential spline fitting perspective and optimise the approach if feasible.

dice and sticks

Posted in Books, Kids, R with tags , , , , , , on November 19, 2021 by xi'an

A quick weekend riddle from the Riddler about the probability of getting a sequence of increasing numbers from dice with an increasing number of faces, eg 4-, 6-, and 8-face dice. Which happens to be 1/4. By sheer calculation (à la Gauss) or simple enumération (à la R):

> for(i in 1:4)for(j in (i+1):6)F=F+(8-j)
> F/4/6/8
[1] 0.25

The less-express riddle is an optimisation problem related with stick breaking: given a stick of length one, propose a fraction a and win (1-a) if a Uniform x is less than one. Since the gain is a(1-a) the maximal average gain is associated with a=½. Now, if the remaining stick (1-a) can be divided when x>a, what is the sequence of fractions one should use when the gain is the length of the remaining stick? With two attempts only, the optimal gain is still ¼. And a simulation experiment with three attempts again returns ¼.