Le Monde puzzle [#825]

Posted in Books, Kids, pictures, R with tags , , , , , on June 19, 2013 by xi'an

Yet another puzzle which first part does not require R programming, even though it is a programming question in essence:

Given five real numbers x1,…,x5, what is the minimal number of pairwise comparisons needed to rank them? Given 33 real numbers, what is the minimal number of pairwise comparisons required to find the three largest ones?

I do not see a way out of considering the first question as the quickest possible sorting of a real sample. Using either quicksort or heapsort, I achieve sorting the 5 numbers in exactly 6 comparisons for any order of the initial sample. (Now, there may be an even faster way based on comparing partial sums first… I just do not see how!)

For the second part, let us start from the remark that 32 comparisons are needed to find the largest number, then at most 31 for the second largest, and at most 30 for the third largest (since we can take advantage of the partial ordering resulting from the determination of the largest number). This is poor. If I instead use a heap algorithm, I need O(n log{n}) comparisons to build this binary tree whose parents are always larger than their siblings, as in the above example. (I can produce a sort of heap structure, although non-binary, in an average 16×2.5=40 steps. And a maximum 16×3=48 steps.) The resulting tree provides the largest number (100 in the above example) and at least the second largest number (36 in the above). To get the third largest number, I first need a comparison between the one-before-last terms of the heap (19 vs. 36 in the above), and one or two extra comparisons (25 vs. 19 and maybe 25 vs. 1 in the above). (This would induce an average 1.5 extra comparison and a maximum 2 extra comparisons, resulting in a total of 41.5 average and 49.5 maximum comparisons with my sub-optimal heap construction.)  Once again, using comparisons of sums may help in speeding up the process, for instance comparing numbers by groups of 3, but I did not pursue this solution…

quick3If instead I try to adapt quicksort to this problem, I can have a dynamic pivot that always keep at most two terms above it, providing the three numbers as a finale result. Here is an R code to check its performances:

quick3=function(x){

comp=0
i=1
lower=upper=NULL
pivot=x[1]

for (i in 2:length(x)){

if (x[i]<pivot){ lower=c(lower,x[i])
}else{

upper=c(upper,x[i])
if (length(upper)>1) comp=comp+1}

comp=comp+1

if (length(upper)==3){

pivot=min(upper)
upper=sort(upper)[-1]
}}

if (length(upper)<3) upper=c(pivot,upper)

list(top=upper,cost=comp)
}

When running this R code on 10⁴ random sequences of 33 terms, I obtained the following statistics, I obtained the following statistics on the computing costs

</p>
<p style="text-align: justify;">> summary(costs)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
32.00   36.00   38.00   37.89   40.00   49.00

and the associated histogram represented above. Interestingly, the minimum is the number of comparisons needed to produce the maximum!

top model choice week (#2)

Posted in Statistics, University life with tags , , , , , , , , , , , , on June 18, 2013 by xi'an

La Défense and Maison-Lafitte from my office, Université Paris-Dauphine, Nov. 05, 2011Following Ed George (Wharton) and Feng Liang (University of Illinois at Urbana-Champaign) talks today in Dauphine, Natalia Bochkina (University of Edinburgh) will  give a talk on Thursday, June 20, at 2pm in Room 18 at ENSAE (Malakoff) [not Dauphine!]. Here is her abstract:

2 am: Simultaneous local and global adaptivity of Bayesian wavelet estimators in nonparametric regression by Natalia Bochkina

We consider wavelet estimators in the context of nonparametric regression, with the aim of finding estimators that simultaneously achieve the local and global adaptive minimax rate of convergence. It is known that one estimator – James-Stein block thresholding estimator of T.Cai (2008) – achieves simultaneously both optimal rates of convergence but over a limited set of Besov spaces; in particular, over the sets of spatially inhomogeneous functions (with 1≤ p<2) the upper bound on the global rate of this estimator is slower than the optimal minimax rate.

Another possible candidate to achieve both rates of convergence simultaneously is the Empirical Bayes estimator of Johnstone and Silverman (2005) which is an adaptive estimator that achieves the global minimax rate over a wide rage of Besov spaces and Besov balls. The maximum marginal likelihood approach is used to estimate the hyperparameters, and it can be interpreted as a Bayesian estimator with a uniform prior. We show that it also achieves the adaptive local minimax rate over all Besov spaces, and hence it does indeed achieve both local and global rates of convergence simultaneously over Besov spaces. We also give an example of how it works in practice.

Bayesian computational tools

Posted in R, Statistics, University life with tags , , , , on June 18, 2013 by xi'an

fig4I just updated my short review on Bayesian computational tools I first wrote in April for the Annual Review of Statistics and Its Applications. The coverage is quite restricted, as I took advantage of two phantom papers I had started a while ago, one with Jean-Michel Marin, on hierarchical Bayes methods and on ABC. (As stressed in the first version, the paper handles missing data, not as a topic, but as a fact!) The running example is the Laplace vs. Gauss model choice problem, first considered in our ABC model choice paper. The referee of the paper was asking for a broader perspective, which makes perfect sense (except that I did not have the time to get that broad). And mentioned a potential missing acknowledgement of priority as Olli’s thesis was using a simple (instead of double) exponential vs. Gauss as its running example. Once again, a plain 25 pages introduction to the topic, not aiming at anything new. The exercise made me ponder whether or not I wanted to engage into it in a near future, with a pessimistic outcome!

Caen snapshot

Posted in pictures, Running, Travel with tags , , on June 17, 2013 by xi'an

IMG_0078

a partial review of BISP8 [guest post]

Posted in Statistics, Travel, University life with tags , , , , , , , on June 17, 2013 by xi'an

Chris Drovandi (QUT) sent me his impression on BISP8 that just took place in Milano, Italia (BISP stands for Bayesian inference in stochastic processes):

Here is a review of some of the talks at BISP8. For the other talks I do not have sufficient background to give the talks the justice that they deserve. It was a very enjoyable small workshop with many talks in my areas of interest.

In the first session Vanja Dukic presented bayesian inference of SEIR epidemic DE models and state space models of google flu trends data. In the case of the state space models a particle learning algorithm was developed. The author considered both fixed and random effects for the data in each US state. In the second session, Murali Haran presented a likelihood-free approach for inferring the parameters of a spatio-temporal epidemic model. The speaker used a Gaussian process emulator of the model based on model simulations from a regulator grid of parameter values. The emulator approach is suggested to be less intensive in terms of the number of model simulations compared with abc but is only suitable for low dimensional inference problems (even less so than abc).

In the first session of day 2 Ana Palacios combined the gompertz model with Markov processes to create flexible and realistic stochastic growth models. The resulting model has a difficult likelihood and inference was performed by completing the likelihood creating simple Gibbs moves and by ABC.

There were 3 talks in a row on inference for SDEs. The first, by Simon Särkkä, avoids evaluating an intractable transition density by proposing from another diffusion model and computing importance weights using the girsanov theorem. Next, Samuel Kou used a population MCMC type approach where each chain had a different Euler discretisation. This helps improve mixing for the chain with the finest grid. Moves between chains are complicated by the different dimension for each chain. The author used a filling approach to overcome this. A very interesting aspect of the talk was using information from all chains to extrapolate various posterior quantiles to delta_t is 0 (no discretisation implying the correct posterior). I assume the extrapolation may not work as well for the extreme quantiles. The third talk, by Andrew Golightly, proposed an auxiliary approach to improve PMCMC for these models. This talk was the most technical (for me) so need more time to digest. Following my talk (based on some work here.  And some current work.) was an applied talk using smc2 methodology.

On the final day Alexandros Beskos investigated the use of SMC for Bayesian inference for a high dimensional (static) parameter. SMC is advocated here due to the ease of adaptation relative to MCMC when there is no structure in the model. The base of the approach I believe was that of Chopin (2002).

new camera (again!)

Posted in pictures with tags , , , on June 16, 2013 by xi'an

IMG_0008Following the loss of my camera in the park, two weeks ago, I bought a new Canon Ixus 240 HS camera, which seems to sell in the US as a Canon Powershot ELPH 320 HS. The characteristics are very similar to the 155 model, which cost me about the same last year, except for an handy touchscreen and a slightly better optical zoom… The selection of options has gotten better, again thanks to the touchscreen, but I fear I will keep overusing the “super-vivid” option to keep catching colours in poor light… Here is the first attempt. (And sorry if this sounds like cheap advertising, I just picked the same brand as I was used to it and the vivid option could produce interesting if artificial effects!)

new camera (stills)

Posted in pictures, Running with tags , , , , on June 15, 2013 by xi'an

IMG_0025IMG_0034

Follow

Get every new post delivered to your Inbox.

Join 357 other followers