shortened iterations [code golf]

Posted in Kids, pictures, Statistics, Travel with tags , , , , , , , , on October 29, 2019 by xi'an

A codegolf lazy morning exercise towards finding the sequence of integers that starts with an arbitrary value n and gets updated by blocks of four as

$a_{4k+1} = a_{4k} \cdot(4k+1)\\ a_{4k+2} = a_{4k+1} + (4k+2)\\ a_{4k+3} = a_{4k+2} - (4k+3)\\ a_{4k+4} = a_{4k+3} / (4k+4)$

until the last term is not an integer. While the update can be easily implemented with the appropriate stopping rule, a simple congruence analysis shows that, depending on n, the sequence is 4, 8 or 12 values long when

$n\not\equiv 1(4)\\ n\equiv 1(4)\ \text{and}\ 3(n-1)+4\not\equiv 0(32)\\ 3(n-1)+4\equiv 0(32)$

respectively. But sadly the more interesting fixed length solution

~=rep #redefine function
b=(scan()-1)*c(32~4,8,40~4,1,9~3)/32+c(1,1,3,0~3,6,-c(8,1,9,-71,17)/8)
b[!b%%1] #keep integers only


ends up being longer than the more basic one:

a=scan()
while(!a[T]%%1)a=c(a,d<-a[T]*T,d+T+1,e<-d-1,e/((T<-T+4)-1))
a[-T]


where Robin’s suggestion of using T rather than length is very cool as T has double meaning, first TRUE (and 1) then the length of a…

un lagarto en las Cataratas del Iguazú [guest jatp]

Posted in Kids, pictures, Travel with tags , , , , , on January 10, 2017 by xi'an

capture-recapture with continuous covariates

Posted in Books, pictures, Statistics, University life with tags , , , , , on September 14, 2015 by xi'an

This morning, I read a paper by Roland Langrock and Ruth King in a 2013 issue of Annals of Applied Statistics that had gone too far under my desk to be noticed… This problem of using continuous variates in capture-recapture models is a frustrating one as it is not clear what one should do at times the subject and therefore its covariates are not observed. This is why I was quite excited by the [trinomial] paper of Catchpole, Morgan, and Tavecchia when they submitted it to JRSS Series B and I was the editor handling it. In the current paper Langrock and King build a hidden Markov model on the capture history (as in Jérôme Dupui’s main thesis paper, 1995), as well as a discretised Markov chain model on the covariates and a logit connection between those covariates and the probability of capture. (At first, I thought the Markov model was a sheer unconstrained Markov chain on the discretised space and found curious that increasing the number of states had a positive impact on the estimation but, blame my Métro environment!, I had not read the paper carefully.)

“The accuracy of the likelihood approximation increases with increasing m.” (p.1719)

While I acknowledge that something has to be done about the missing covariates, and that this approach may be the best one can expect in such circumstances, I nonetheless disagree with the above notion that increasing the discretisation step m will improve the likelihood approximation, simply because the model on the covariates that was chosen ex nihilo has no reason to fit the real phenomenon, especially since the value of the covariates impact the probability of capture: the individuals are not (likely to get) missing at random, i.e., independently from the covariates. For instance, in a lizard study on which Jérôme Dupuis worked in the early 1990’s, weight and survival were unsurprisingly connected, with a higher mortality during the cold months where food was sparse. Using autoregressive-like models on the covariates is missing the possibility of sudden changes in the covariates that could impact the capture patterns. I do not know whether or not this has been attempted in this area, but connecting the covariates between individuals at a specific time, so that missing covariates can be inferred from observed covariates, possibly with spatial patterns, would also make sense.

In fine, I fear there is a strong and almost damning limitation to the notion of incorporating covariates into capture-recapture models, namely, if a covariate is determinantal in deciding of a capture or non-capture, the non-capture range of the covariate will never be observed and hence cannot be derived from the observed values.

Principles of Applied Statistics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on February 13, 2012 by xi'an

This book by David Cox and Christl Donnelly, Principles of Applied Statistics, is an extensive coverage of all the necessary steps and precautions one must go through when contemplating applied (i.e. real!) statistics. As the authors write in the very first sentence of the book, “applied statistics is more than data analysis” (p.i); the title could indeed have been “Principled Data Analysis”! Indeed, Principles of Applied Statistics reminded me of how much we (at least I) take “the model” and “the data” for granted when doing statistical analyses, by going through all the pre-data and post-data steps that lead to the “idealized” (p.188) data analysis. The contents of the book are intentionally simple, with hardly any mathematical aspect, but with a clinical attention to exhaustivity and clarity. For instance, even though I would have enjoyed more stress on probabilistic models as the basis for statistical inference, they only appear in the fourth chapter (out of ten) with error in variable models. The painstakingly careful coverage of the myriad of tiny but essential steps involved in a statistical analysis and the highlight of the numerous corresponding pitfalls was certainly illuminating to me.  Just as the book refrains from mathematical digressions (“our emphasis is on the subject-matter, not on the statistical techniques as such p.12), it falls short from engaging into detail and complex data stories. Instead, it uses little grey boxes to convey the pertinent aspects of a given data analysis, referring to a paper for the full story. (I acknowledge this may be frustrating at times, as one would like to read more…) The book reads very nicely and smoothly, and I must acknowledge I read most of it in trains, métros, and planes over the past week. (This remark is not  intended as a criticism against a lack of depth or interest, by all means [and medians]!)

A general principle, sounding superficial but difficult to implement, is that analyses should be as simple as possible, but not simpler.” (p.9)

To get into more details, Principles of Applied Statistics covers the (most!) purposes of statistical analyses (Chap. 1), design with some special emphasis (Chap. 2-3), which is not surprising given the record of the authors (and “not a moribund art form”, p.51), measurement (Chap. 4), including the special case of latent variables and their role in model formulation, preliminary analysis (Chap. 5) by which the authors mean data screening and graphical pre-analysis, [at last!] models (Chap. 6-7), separated in model formulation [debating the nature of probability] and model choice, the later being  somehow separated from the standard meaning of the term (done in §8.4.5 and §8.4.6), formal [mathematical] inference (Chap. 8), covering in particular testing and multiple testing, interpretation (Chap. 9), i.e. post-processing, and a final epilogue (Chap. 10). The readership of the book is rather broad, from practitioners to students, although both categories do require a good dose of maturity, to teachers, to scientists designing experiments with a statistical mind. It may be deemed too philosophical by some, too allusive by others, but I think it constitutes a magnificent testimony to the depth and to the spectrum of our field.

Of course, all choices are to some extent provisional.“(p.130)

As a personal aside,  I appreciated the illustration through capture-recapture models (p.36) with a remark of the impact of toe-clipping on frogs, as it reminded me of a similar way of marking lizards when my (then) student Jérôme Dupuis was working on a corresponding capture-recapture dataset in the 90’s. On the opposite, while John Snow‘s story [of using maps to explain the cause of cholera] is alluring, and his map makes for a great cover, I am less convinced it is particularly relevant within this book.

The word Bayesian, however, became more widely used, sometimes representing a regression to the older usage of flat prior distributions supposedly representing initial ignorance, sometimes meaning models in which the parameters of interest are regarded as random variables and occasionaly meaning little more than that the laws of probability are somewhere invoked.” (p.144)

My main quibble with the book goes, most unsurprisingly!, with the processing of Bayesian analysis found in Principles of Applied Statistics (pp.143-144). Indeed, on the one hand, the method is mostly criticised over those two pages. On the other hand, it is the only method presented with this level of details, including historical background, which seems a bit superfluous for a treatise on applied statistics. The drawbacks mentioned are (p.144)

• the weight of prior information or modelling as “evidence”;
• the impact of “indifference or ignorance or reference priors”;
• whether or not empirical Bayes modelling has been used to construct the prior;
• whether or not the Bayesian approach is anything more than a “computationally convenient way of obtaining confidence intervals”

The empirical Bayes perspective is the original one found in Robbins (1956) and seems to find grace in the authors’ eyes (“the most satisfactory formulation”, p.156). Contrary to MCMC methods, “a black box in that typically it is unclear which features of the data are driving the conclusions” (p.149)…

If an issue can be addressed nonparametrically then it will often be better to tackle it parametrically; however, if it cannot be resolved nonparametrically then it is usually dangerous to resolve it parametrically.” (p.96)

Apart from a more philosophical paragraph on the distinction between machine learning and statistical analysis in the final chapter, with the drawback of using neural nets and such as black-box methods (p.185), there is relatively little coverage of non-parametric models, the choice of “parametric formulations” (p.96) being openly chosen. I can somehow understand this perspective for simpler settings, namely that nonparametric models offer little explanation of the production of the data. However, in more complex models, nonparametric components often are a convenient way to evacuate burdensome nuisance parameters…. Again, technical aspects are not the focus of Principles of Applied Statistics so this also explains why it does not dwell intently on nonparametric models.

A test of meaningfulness of a possible model for a data-generating process is whether it can be used directly to simulate data.” (p.104)

The above remark is quite interesting, especially when accounting for David Cox’ current appreciation of ABC techniques. The impossibility to generate from a posited model as some found in econometrics precludes using ABC, but this does not necessarily mean the model should be excluded as unrealistic…

The overriding general principle is that there should be a seamless flow between statistical and subject-matter considerations.” (p.188)

As mentioned earlier, the last chapter brings a philosophical conclusion on what is (applied) statistics. It is stresses the need for a careful and principled use of black-box methods so that they preserve a general framework and lead to explicit interpretations.