## generating from a failure rate function [X’ed]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on July 4, 2015 by xi'an

While I now try to abstain from participating to the Cross Validated forum, as it proves too much of a time-consuming activity with little added value (in the sense that answers are much too often treated as disposable napkins by users who cannot be bothered to open a textbook and who usually do not exhibit any long-term impact of the provided answer, while clogging the forum with so many questions that the individual entries seem to get so little traffic, when compared say with the stackoverflow forum, to the point of making the analogy with disposable wipes more appropriate!), I came across a truly interesting question the other night. Truly interesting for me in that I had never considered the issue before.

The question is essentially wondering at how to simulate from a distribution defined by its failure rate function, which is connected with the density f of the distribution by

$\eta(t)=\frac{f(t)}{\int_t^\infty f(x)\,\text{d}x}=-\frac{\text{d}}{\text{d}t}\,\log \int_t^\infty f(x)\,\text{d}x$

From a purely probabilistic perspective, defining the distribution through f or through η is equivalent, as shown by the relation

$F(t)=1-\exp\left\{-\int_0^t\eta(x)\,\text{d}x\right\}$

but, from a simulation point of view, it may provide a different entry. Indeed, all that is needed is the ability to solve (in X) the equation

$\int_0^X\eta(x)\,\text{d}x=-\log(U)$

when U is a Uniform (0,1) variable. Which may help in that it does not require a derivation of f. Obviously, this also begs the question as to why would a distribution be defined by its failure rate function.

## probabilistic numerics

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on April 27, 2015 by xi'an

I attended an highly unusual workshop while in Warwick last week. Unusual for me, obviously. It was about probabilistic numerics, i.e., the use of probabilistic or stochastic arguments in the numerical resolution of (possibly) deterministic problems. The notion in this approach is fairly Bayesian in that it makes use to prior information or belief about the quantity of interest, e.g., a function, to construct an usually Gaussian process prior and derive both an estimator that is identical to a numerical method (e.g., Runge-Kutta or trapezoidal integration) and uncertainty or variability around this estimator. While I did not grasp much more than the classy introduction talk by Philipp Hennig, this concept sounds fairly interesting, if only because of the Bayesian connection, and I wonder if we will soon see a probability numerics section at ISBA! More seriously, placing priors on functions or functionals is a highly formal perspective (as in Bayesian non-parametrics) and it makes me wonder how much of the data (evaluation of a function at a given set of points) and how much of the prior is reflected in the output [variability]. (Obviously, one could also ask a similar question for statistical analyses!)  For instance, issues of singularity arise among those stochastic process priors.

Another question that stemmed from this talk is whether or not more efficient numerical methods can derived that way, in addition to recovering the most classical ones. Somewhat, somehow, given the idealised nature of the prior, it feels like priors could be more easily compared or ranked than in classical statistical problems. Since the aim is to figure out the value of an integral or the solution to an ODE. (Or maybe not, since again almost the same could be said about estimating a normal mean.)

## stopping rule impact

Posted in Books, R, Statistics, University life with tags , , , on May 9, 2014 by xi'an

Here is a question from my friend Shravan Vasishth about the consequences of using a stopping rule:

Psycholinguists and psychologists often adopt the following type of data-gathering procedure: The experimenter gathers n data points, then checks for significance (p<0.05 or not). If it’s not significant, he gets more data (n more data points). Since time and money are limited, he might decide to stop anyway at sample size, say, some multiple of n.  One can play with different scenarios here. A typical n might be 10 or 15.

This approach would give us a distribution of t-values and p-values under repeated sampling. Theoretically, under the standard assumptions of frequentist methods, we expect a Type I error to be 0.05. This is the case in standard analyses (I also track the t-statistic, in order to compare it with my stopping rule code below).

Here’s a simulation showing what happens. I wanted to ask you whether this simulation makes sense. I assume here that the experimenter gathers 10 data points, then checks for significance (p<0.05 or not). If it’s not significant, he gets more data (10 more data points). Since time and money are limited, he might decide to stop anyway at sample size 60. This gives us p-values under repeated sampling. Theoretically, under the standard assumptions of frequentist methods, we expect a Type I error to be 0.05. This is the case in standard analyses:
##Standard:
pvals<-NULL
tstat_standard<-NULL
n<-10 # sample size
nsim<-1000 # number of simulations
stddev<-1 # standard dev
mn<-0 ## mean

for(i in 1:nsim){
samp<-rnorm(n,mean=mn,sd=stddev)
pvals[i]<-t.test(samp)$p.value tstat_standard[i]<-t.test(samp)$statistic}

## Type I error rate: about 5% as theory says:
table(pvals<0.05)[2]/nsim


But the situation quickly deteriorates as soon as adopt the strategy I outlined above:

pvals<-NULL
tstat<-NULL
## how many subjects can I run?
upper_bound<-n*6

for(i in 1:nsim){
## at the outset we have no significant result:
significant<-FALSE
## null hyp is going to be true,
## so any rejection is a mistake.
## take sample:
x<-rnorm(n,mean=mn,sd=stddev)
while(!significant & length(x)<upper_bound){
## if not significant:
if(t.test(x)$p.value>0.05){ ## get more data: x<-append(x,rnorm(n,mean=mn,sd=stddev)) ## otherwise stop: } else {significant<-TRUE}} ## will be either significant or not: pvals[i]<-t.test(x)$p.value
tstat[i]<-t.test(x)\$statistic}


Now let’s compare the distribution of the t-statistic in the standard case vs with the above stopping rule. We get fatter tails with the above stopping rule, as shown by the histogram below.

Is this a correct way to think about the stopping rule problem?

To which I replied the following:

By adopting a stopping rule on a random iid sequence, you favour values in the sequence that agree with your stopping condition, hence modify the distribution of the outcome. To take an extreme example, if you draw N(0,1) variates until the empirical average is between -2 and 2, the average thus produced cannot remain N(0,1/n) but have a different distribution.

The t-test statistic you build from your experiment is no longer distributed as a uniform variate because of the stopping rule: the sample(x1,…,x10m) (with random size 10m [resulting from increases in the sample size by adding 10 more observations at a time] is distributed from

$\prod_{i=1}^{10m} \phi(x_i) \times \prod_{j=1}^{m-1} \mathbb{I}_{t(x_1,\ldots,x_{10j})>.05} \times \mathbb{I}_{t(x_1,\ldots,x_{10m})<.05}$

if 10m<60 [assuming the maximal acceptable sample size is 60] and from

$\prod_{i=1}^{60} \phi(x_i) \times \prod_{j=1}^{5} \mathbb{I}_{t(x_1,\ldots,x_{10j})>.05}$

otherwise. The histogram at the top of this post is the empirical distribution of the average of those observations, clearly far from a normal distribution.

## Bayes on the radio (regrets)

Posted in Books, Kids, Running, Statistics with tags , , , , , on November 13, 2012 by xi'an

While running this morning I was reconsidering (over and over) my discussion of Bayes’ formula on the radio and thought I should have turned the presentation of Bayes’ theorem differently. I spent much too much time on the math side of Bayes’ formula and not enough on the stat side. The math aspect is not of real importance as it is a mere reformulation of conditional probabilities. The stat side is what matters as introducing a (prior) distribution on the parameter (space) is the #1 specificity of Bayesian statistics…. And the focus point of most criticisms, as expressed later by the physicist working on the Higgs boson, Dirk Zerwas.

I also regret not mentioning that Bayes’ formula was taught in French high schools, as illustrated by the anecdote of Bayes at the bac. And not reacting at the question about Bayes in the courtroom with yet another anecdote of Bayes’ formula been thrown out of the accepted tools by an English court of appeal about a year ago. Oh well, another argument for sticking to the written world.

## Computational Challenges in Probability [ICERM, Sept. 5 – Dec. 7]

Posted in Statistics, Travel, University life with tags , , , , , , , , , on May 18, 2012 by xi'an

I have just received an invitation to take part in the program “Computational Challenges in Probability” organised by ICERM (Institute for Computational and Experimental Research in Mathematics, located in what sounds like a terrific building!) next semester. Here is the purpose statement:

The Fall 2012 Semester on “Computational Challenges in Probability” aims to bring together leading experts and young researchers who are advancing the use of probabilistic and computational methods to study complex models in a variety of fields. The goal is to identify common challenges, exchange existing tools, reveal new application areas and forge new collaborative efforts. The semester includes four workshops – Bayesian Nonparametrics, Uncertainty Quantification, Monte Carlo Methods in the Physical and Biological Sciences and Performance Analysis of Monte Carlo Methods. In addition, synergistic activities will be planned throughout the duration of the semester. In particular, there will be several short courses and plenary invited talks by experts on related topics such as graphical models, randomized algorithms and stochastic networks, regular weekly seminars and relevant film screenings.

There are thus four workshops organised over the period and an impressive collection of long-term participants. I will most likely take part in the last workshop, “Performance Analysis of Monte Carlo Methods”, although I would like to attend all of them! (Interesting side remark: while looking at the ICERM website, I found that May 18th is the Day of Data! Great, except that neither the word statistitics nor the word statistician appear on the page…)

Posted in Books, Statistics, University life with tags , , on June 13, 2011 by xi'an

After delivering my one-day lecture on Jaynes’ Probability Theory, I gave as assignment to the students that they wrote their own analysis of Chapter 15 (Paradoxes of probability theory), given its extensive and exciting coverage of the marginalisation paradoxes and my omission of it in the lecture notes… Up to now, only Jean-Bernard Salomon has returned a (good albeit short) synthesis of the chapter, seemingly siding with Jaynes’ analysis that a “good” noninformative prior should avoid the paradox. (In short, my own view of the problem is to side with Dawid, Stone, and Zidek, in that the paradox is only a paradox when interpreting marginals of infinite measures as if they were probability marginals…) This made me wonder if there could be a squared marginalisation paradox: find a statistical model parameterised by θ with a nuisance parameter η=η(θ) such that when the parameter of interest is ξ=ξ(θ) the prior on η solving the marginalisation paradox is not the same as when the parameter of interest is ζ=ζ(θ) [I have not given the problem more than a few seconds thought so this may prove a logical impossibility!]

## Frequency vs. probability

Posted in Statistics with tags , , , , , , , on May 6, 2011 by xi'an

Probabilities obtained by maximum entropy cannot be relevant to physical predictions because they have nothing to do with frequencies.” E.T. Jaynes, PT, p.366

A frequency is a factual property of the real world that we measure or estimate. The phrase estimating a probability’ is just as much an incongruity as assigning a frequency’. The fundamental, inescapable distinction between probability and frequency lies in this relativity principle: probabilities change when we change our state of knowledge, frequencies do not.” E.T. Jaynes, PT, p.292

A few days ago, I got the following email exchange with Jelle Wybe de Jong from The Netherlands:

Q. I have a question regarding your slides of your presentation of Jaynes’ Probability Theory. You used the [above second] quote: Do you agree with this statement? It seems to me that a lot of  ‘Bayesians’ still refer to ‘estimating’ probabilities. Does it make sense for example for a bank to estimate a probability of default for their loan portfolio? Or does it only make sense to estimate a default frequency and summarize the uncertainty (state of knowledge) through the posterior? Continue reading