## Maximum likelihood vs. likelihood-free quantum system identification in the atom maser

Posted in Books, Statistics, University life with tags , , , , , , on December 2, 2013 by xi'an

This paper (arXived a few days ago) compares maximum likelihood with different ABC approximations in a quantum physic setting and for an atom maser modelling that essentially bears down to a hidden Markov model. (I mostly blanked out of the physics explanations so cannot say I understand the model at all.) While the authors (from the University of Nottingham, hence Robin’s statue above…) do not consider the recent corpus of work by Ajay Jasra and coauthors (some of which was discussed on the ‘Og), they get interesting findings for an equally interesting model. First, when comparing the Fisher informations on the sole parameter of the model, the “Rabi angle” φ, for two different sets of statistics, one gets to zero at a certain value of the parameter, while the (fully informative) other is maximum (Figure 6). This is quite intriguing, esp. give the shape of the information in the former case, which reminds me of (my) inverse normal distributions. Second, the authors compare different collections of summary statistics in terms of ABC distributions against the likelihood function. While most bring much more uncertainty in the analysis, the whole collection recovers the range and shape of the likelihood function, which is nice. Third, they also use a kolmogorov-Smirnov distance to run their ABC, which is enticing, except that I cannot fathom from the paper when one would have enough of a sample (conditional on a parameter value) to rely on what is essentially an estimate of the sampling distribution. This seems to contradict the fact that they only use seven summary statistics. Or it may be that the “statistic” of waiting times happens to be a vector, in which case a Kolmogorov-Smirnov distance can indeed be adopted for the distance… The fact that the grouped seven-dimensional summary statistic provides the best ABC fit is somewhat of a surprise when considering the problem enjoys a single parameter.

“However, in practice, it is often difficult to find an s(.) which is sufficient.”

Just a point that irks me in most ABC papers is to find quotes like the above, since in most models, it is easy to show that there cannot be a non-trivial sufficient statistic! As soon as one leaves the exponential family cocoon, one is doomed in this respect!!!

## Le Monde puzzle [#842]

Posted in Books, Kids, R with tags , , , on November 30, 2013 by xi'an

An easily phrased (and solved?) Le Monde mathematical puzzle that does not [really] require an R code:

The five triplets A,B,C,D,E are such that

$A_1=B_2+C_2+D_2+E_2\,,\ B_1=A_2+C_2+D_2+E_2\,,...$

and

$A_2=B_3+C_3+D_3+E_3\,,\ B_2=A_3+C_3+D_3+E_3\,,...$

Given that

$A_1=193\,,\ B_1=175\,, C_1=185\,, D_1=187\,, E_1<175$

find the five triplets.

Adding up both sets of equations shows everything solely depends upon E1… So running an R code that checks for all possible values of E1 is a brute-force solution. However, one must first find what to check. Given that the sums of the triplets are of the form (16s,4s,s), the possible choices for E1 are necessarily restricted to

```> S0=193+187+185+175
> ceiling(S0/16)
[1] 47
> floor((S0+175)/16)
[1] 57
> (47:57)*16-S0 #E1=S1-S0
[1]  12  28  44  60  76  92 108 124 140 156 172
```

The first two values correspond to a second sum S2 equal to 188 and 192, respectively, which is incompatible with A1 being 193. Furthermore, the corresponding values for E2 and E3 are then given by

```> S2==(49:57)*4
> E1=(49:57)*16-S0
> E2=S2-E1
> S3=S2/4
> S3-E2
[1] -103  -90  -77  -64  -51  -38  -25  -12    1
```

which excludes all values but E1=172. No brute-force in the end…

## Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4″. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

```T=10^6
four=rep(0,T)
for (t in 1:T){
day=sort(sample(1:365,30,rep=TRUE)) #30 random days
day=c(day,day[day>363]-365) #account for toric difference
tem=outer(day,day,"-")
four[t]=(max(apply(((tem>-1)&(tem<4)),1,sum)>3))
}
mean(four)
```

[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…

Posted in Books, Kids, Statistics, University life with tags , , , , , , on November 29, 2013 by xi'an

This week at the Reading Classics student seminar, Thomas Ounas presented a paper, Statistical inference on massive datasets, written by Li, Lin, and Li, a paper out of The List. (This paper was recently published as Applied Stochastic Models in Business and Industry, 29, 399-409..) I accepted this unorthodox proposal as (a) it was unusual, i.e., this was the very first time a student made this request, and (b) the topic of large datasets and their statistical processing definitely was interesting even though the authors of the paper were unknown to me. The presentation by Thomas was very power-pointish (or power[-point]ful!), with plenty of dazzling transition effects… Even including (a) a Python software replicating the method and (b) a nice little video on internet data transfer protocols. And on a Linux machine! Hence the experiment was worth the try! Even though the paper is a rather unlikely candidate for the list of classics… (And the rendering in static power point no so impressive. Hence a video version available as well…)

The solution adopted by the authors of the paper is one of breaking a massive dataset into blocks so that each fits into the computer(s) memory and of computing a separate estimate for each block. Those estimates are then averaged (and standard-deviationed) without a clear assessment of the impact of this multi-tiered handling of the data. Thomas then built a software to illustrate this approach, with mean and variance and quantiles and densities as quantities of interest. Definitely original! The proposal itself sounds rather basic from a statistical viewpoint: for instance, evaluating the loss in information due to using this blocking procedure requires repeated sampling, which is unrealistic. Or using solely the inter-variance estimates which seems to be missing the intra-variability. Hence to be overly optimistic. Further, strictly speaking, the method does not asymptotically apply to biased estimators, hence neither to Bayes estimators (nor to density estimators). Convergence results are thus somehow formal, in that the asymptotics cannot apply to a finite memory computer. In practice, the difficulty of the splitting technique is rather in breaking the data into blocks since Big Data is rarely made of iid observations. Think of amazon data, for instance. A question actually asked by the class. The method of Li et al. should also include some boostrapping connection. E.g., to Michael’s bag of little bootstraps.

## revised (lower?) standards for statistical evidence

Posted in Books on November 28, 2013 by xi'an

Valen Johnson published a follow-up paper to his Annals of Statistics paper on uniformly most powerful Bayesian tests. This one aims at reaching a wider audience by (a) publishing in PNAS (b) linking the lack of reproducibility in scientific research with the improper use of significance levels and (c) proposing to move from 0.05 to 0.005 or even 0.001. (As noted in a previous post, the clarity of the proposal was bound to attract journalists.) The criticism of the significance level and of the sacrosanct status of 0.05 (not so much in fields like Physics or Astronomy) is not novel, see for instance the extreme The cult of significance reviewed here two years ago. But most of the PNAS paper is dedicated to the technical derivation of UMPBTs, rather than providing strong arguments in their favour. There is no discussion of the analysis of true small effects, i.e. whether or not significance tests should be used in such cases. (Not!) The overall messages of the paper are thus unsubstantiated. Except the obvious one that lowering the significance level will lower the number of false positives.

“Modifications of common standards of evidence are proposed to reduce the rate of nonreproducibility of scientific research by a factor of 5 or greater.”

My first reaction to the proposal is that moving from a reference significance level to another reference significance level does not change an iota to the existing criticisms. Namely, adopting another standard for blind rejection of the null remains blindly rejecting the null. (The same criticism applies to Jeffreys’ scale, mind you.) Furthermore, as exposed in my earlier criticism, whose points obviously apply here, the resolution proposed by Valen is only Bayesian by the terms it uses, as it relies on least favourable priors in a minimax sense. And almost removes the notion of alternative hypotheses from the picture. Which in my opinion is the strongest appeal of the Bayesian perspective on decisional model selection. What Valen built there is a goodness of fit procedure with a Bayesianish flavour. Not a Bayesian test. As a marginalia, I fail to understand the point of Fig. 1 and 2. The “strong curvilinear relation between” p-values and UMPBT “Bayes” factors is a consequence of both of them depending on the … data. It would actually been slightly more pertinent to compare p-values and posterior probabilities (under “equipoise”).

Second, Valen’s proposal depends upon a choice of an evidence threshold γ that seems to be calibrated against the standard significance test, as illustrated on page 2 with the z test. The value γ = 3.87 is chosen so that the “the rejection region of the resulting test exactly matches the rejection region of a one-sided 5% significance test”. This means further that γ  also seems to be calibrated against the uniformly most powerful (or least favourable) Bayes factor constructed for this purpose. Thus, γ is relative—as opposed to absolute—for two reasons. Meaning that the whole notion of uniformly most powerful tests is somehow tautological, rather than induced by the real problem. And that comparing another Bayes factor (i.e., for another alternative) to the same threshold is meaningless.

“Although it is difficult to assess the proportion of all
tested null hypotheses that are actually true, if one assumes that this proportion is approximately one-half, then these results suggest that between 17% and 25% of marginally significant scientific findings are false.”

Third, the discussion that revolves around the above quote, while very attractive for the media and plain enough for the general public, is unrelated to any statistical ground in the paper, were it frequentist or Bayesian. Even though it steps in with Jim Berger’s earlier evaluations. First, I dispute the assumption that the proportion is a half, when considering that only borderline hypotheses are run through statistical tests, so I would think the proportion is much higher. Second, I do not see what “these results” refer to. The only evidence found therein is the “distribution” of p-values in Fig.3 which amalgamates reported p-values from 855 t-tests found in the literature, with no correction for sample size, truncation, censoring, and a myriad of other possible variations from the formal property that “the nominal distribution of p-values is uniformly distributed on the range (0.0,0.05)”.