A PhD student position in mathematical statistics on simulation-based inference methods for models with an “intractable” likelihood is available at the Dept. Mathematical Sciences, Chalmers University, Gothenburg (Sweden).

You will be part of an international collaboration to create new methodology bridging between simulation-based inference (such as approximate Bayesian computation and other likelihood-free methods) and deep neuronal networks. The goal is to ease inference for stochastic modelling.

Details on the project and the essential requirements are at https://www.chalmers.se/en/departments/math/research/research-groups/AIMS/Pages/ai-project-5.aspx

The PhD student position is fully funded and is up to 5 years, in the dynamic and international city of Gothenburg, the second largest city in Sweden, https://www.goteborg.com/en/ As a PhD student in Mathematical Sciences you will have opportunities for many inspiring conversations, a lot of autonomous work and some travel.

The position will be supervised by Assoc. Prof. Umberto Picchini.

Apply by **01 June 2020** following the instructions at

https://www.chalmers.se/en/about-chalmers/Working-at-Chalmers/Vacancies/Pages/default.aspx?rmpage=job&rmjob=8556

For informal enquiries, please get in touch with Umberto Picchini

]]>

“We thus conclude that two birth-death models are congruent if and only if they have the same r_{p}and the same λ_{p}at some time point in the present or past.” [S.1.1, p.4]

Or, stated otherwise, that a tree structured dataset made of branch lengths are not enough to identify two functions that parameterise the model. The likelihood looks like

$

where *E(.)* is the probability to survive to the present and *ψ(s,t)* the probability to survive and be sampled between times s and t. Sort of. Both functions depending on functions *λ(.) *and* μ(.)*. (When the stem age is unknown, the likelihood changes a wee bit, but with no changes in the qualitative conclusions. Another way to write this likelihood is in term of the speciation rate *λ _{p}*

where *Λ _{p}* is the integrated rate, but which shares the same characteristic of being unable to identify the functions

“…we explain why model selection methods based on parsimony or “Occam’s razor”, such as the Akaike Information Criterion and the Bayesian Information Criterion that penalize excessive parameters, generally cannot resolve the identifiability issue…” [S.2, p15]

As illustrated by the above quote, the supplementary material also includes a section about statistical model selections techniques failing to capture the issue, section that seems superfluous or even absurd once the fact that the likelihood is constant across a congruence class has been stated.

]]>which means cdf inversion could be implemented in principle. But in practice, assuming the integral is intractable, what would an exact solution look like? Including MCMC versions exploiting one fixed point representation or the other.. Since

using an unbiased estimator of the exponential term in a pseudo-marginal algorithm would work. And getting an unbiased estimator of the exponential term can be done by Glynn & Rhee debiasing. But this is rather costly… Having Devroye’s book under my nose [at my home desk] should however have driven me earlier to the obvious solution to… simply open it!!! A whole section (VI.2) is indeed dedicated to simulations when the distribution is given by the hazard rate. (Which made me realise this problem is related with PDMPs in that thinning and composition tricks are common to both.) Besides the inversion method, ie X=H⁻¹(U), Devroye suggests thinning a Poisson process when h(·) is bounded by a manageable g(·). Or a generic dynamic thinning approach that converges when h(·) is non-increasing.

]]>

“On two occasions I have been asked, “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”

following earlier quotes from him on this ‘Og.

]]>

There are zero to three goats, with a probability ¼ each, and they are allocated to different doors uniformly among the three doors of the show. After the player chooses a door, Monty opens another door hidding a goat or signals this is impossible. Given that he did open a door, what is the probability that the player’s door does not hide a goat?

Indeed, a straightforward conditional probability computation considering all eight possible cases with the four cases corresponding to Monty opening a door leads to a probability of ^{3}/_{8} for the player’s door. As confirmed by the following R code:

s=sample m=c(0,0) for(t in 1:1e6)m=m+(range(s(1:3,s(1:3,1)))>1)]]>

Four isolated persons are given four fair coins, which can be either flipped once or returned without being flipped. If all flipped coins come up heads, the team wins! Else, if any comes up tails, or if no flip at all is done, it looses. Each person is further given an independent U(0,1) realisation. What is the best strategy?

Since the players are separated, I would presume the same procedure is used by all. Meaning that a coin is tossed with probability p, ie if the uniform is less than p, and untouched otherwise. The probability of winning is then

4(1-p)³p½+6(1-p)³p½²+4(1-p)p³½³+p⁴½⁴

which is maximum for p=0.3420391, with a winning probability of 0.2848424.

And an extra puzzle for free:

*solve x⌊x⌊x⌊x⌋⌋⌋=2020
*

Where the integral part is the integer immediately below x. Puzzle that I first fail solving by brute force, because I did not look at negative x’s… Since the fourth root of 2020 is between 6 and 7, the solution is either x=6+ε *or* x=-7+ε, with ε in (0,1). The puzzle then becomes either

(6+ε)⌊(6+ε)⌊(6+ε)⌊6+ε⌋⌋⌋ = (6+ε)⌊(6+ε)⌊36+6ε⌋⌋ = (6+ε)⌊(6+ε)(36+⌊6ε⌋)⌋ = 2020

where there are 6 possible integer values for ⌊6ε⌋, with only ⌊6ε⌋=5 being possible, turning the equation into

(6+ε)⌊41(6+ε)⌋ = (6+ε)(246+⌊41ε⌋) = 2020

where again only ⌊42ε⌋=40 being possible, ending up with

1716+286ε = 2020

which has no solution in (0,1). In the second case

(-7+ε)⌊(-7+ε)⌊(-7+ε)⌊-7+ε⌋⌋⌋ = (-7+ε)⌊(-7+ε)(49+⌊-7ε⌋)⌋ = 2020

shows that only ⌊-7ε⌋=-3 is possible, leading to

(-7+ε)⌊46(-7+ε))⌋ = (-7+ε) (-322+⌊46ε⌋)=2020

with only ⌊46ε⌋=17 possible, hence

2135-305ε=2020

and

ε=115/305.

A brute force simulated annealing resolution returns x=-6.622706 after 10⁸ iterations. A more interesting question is to figure out the discontinuity points of the function

ℵ(x) = x⌊x⌊x⌊x⌋⌋⌋

as they seem to be numerous:

For instance, only 854 of the first 2020 integers enjoy a solution to ℵ(x)=n.

]]>