Archive for R

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , , on May 19, 2022 by xi'an

A question from X validated had enough appeal for me to procrastinate about it for ½ an hour: what difference does it make [for simulation purposes] that a target density is properly normalised? In the continuous case, I do not see much to exploit about this knowledge, apart from the value potentially leading to a control variate (in a Gelfand and Dey 1996 spirit) and possibly to a stopping rule (by checking that the portion of the space visited so far has mass close to one, but this is more delicate than it sounds).

In a (possibly infinite) countable setting, it seems to me one gain (?) is that approximating expectations by Monte Carlo no longer requires iid simulations in the sense that once visited,  atoms need not be visited again. Self-avoiding random walks and their generalisations thus appear as a natural substitute for MC(MC) methods in this setting, provided finding unexplored atoms proves manageable. For instance, a stopping rule is always available, namely that the cumulated weight of the visited fraction of the space is close enough to one. The above picture shows a toy example on a 500 x 500 grid with 0.1% of the mass remaining at the almost invisible white dots. (In my experiment, neighbours for the random exploration were chosen at random over the grid, as I assumed no global information was available about the repartition over the grid either of mass function or of the function whose expectation was seeked.)

Carmichael number, more or less

Posted in Books, Kids, Statistics with tags , , , , , , , on May 6, 2022 by xi'an

A quick-and-dirty R resolution of a riddle from The Riddler, namely to find a Carmichael number of the form abcabc:

library(numbers)
for(i in 1:9)
for(j in 0:9)
for(k in 0:9){
x=i*100100+j*1010+k*101
if(!isPrime(x)){
p=primeFactors(x)
if((prod(apply(outer(p,p,F="=="),1,sum)%%2))&
(!max((x-1)%%(p-1))))break()}}


resulting into the number 101 101, since its prime factors are

> primeFactors(101101)
[1]   7  11  13 101


and 6, 10, 12, and 100 are divisors of 101100:

> primeFactors(101100)
[1] 2 2 3 5 5 337


truncated mixtures

Posted in Books, pictures, R, Statistics with tags , , , , , on May 4, 2022 by xi'an

A question on X validated about EM steps for a truncated Normal mixture led me to ponder whether or not a more ambitious completion [more ambitious than the standard component allocation] was appropriate. Namely, if the mixture is truncated to the interval (a,b), with an observed sample x of size n, this sample could be augmented into an untrucated sample y by latent samples over the complement of (a,b), with random sizes corresponding to the probabilities of falling within (-∞,a), (a,b), and (b,∞). In other words, y is made of three parts, including x, with sizes N¹, n, N³, respectively, the vector (N¹, n, N³) being a trinomial M(N⁺,p) random variable and N⁺ an extra unknown in the model. Assuming a (pseudo-) conjugate prior, an approximate Gibbs sampler can be run (by ignoring the dependence of p on the mixture parameters!). I did not go as far as implementing the idea for the mixture, but had a quick try for a simple truncated Normal. And did not spot any explosive behaviour in N⁺, which is what I was worried about.  Of course, this is mostly anecdotal since the completion does not bring a significant improvement in coding or convergence (the plots corresponds to 10⁴ simulations, for a sample of size n=400).

null recurrent = zero utility?

Posted in Books, R, Statistics with tags , , , , , , , on April 28, 2022 by xi'an

The stability result that the ratio

$\dfrac{\sum^T_{t=1} f(\theta^{(t)})}{\sum^T_{t=1} g(\theta^{(t)})}\qquad(1)$

converges holds for a Harris π-null-recurrent Markov chain for all functions f,g in L¹(π) [Meyn & Tweedie, 1993, Theorem 17.3.2] is rather fascinating. However, it is unclear it can be useful in simulation environments, as for the integral priors we have been studying over the years with Juan Antonio Cano and Diego Salmeron Martinez. Above, the result of an experiment where I simulated a Markov chain as a Normal random walk in dimension one, hence a Harris π-null-recurrent Markov chain for the Lebesgue measure λ, and monitored the stabilisation of the ratio (1) when using two densities for f and g,  to its expected value (1, shown by a red horizontal line). There is quite a variability in the outcome (repeated 100 times),  but the most intriguing is the quick stabilisation of most cumulated averages to values different from 1. Even longer runs display this feature

which I would blame on the excursions of the random walk far away from the central regions for both f and g, that is on long sequences where zeroes keep being added to numerator and denominators in (1). As far as integral approximation is concerned, this is not very helpful!

riddle of the week

Posted in R with tags , , , , , on April 21, 2022 by xi'an

The Riddler of April 1 offered this simple question:

start with the number 1 and then try to reach a target number through a series of steps. For each step, you can always choose to double the number you currently have. However, if the number happens to be one (1) more than an odd multiple of 3, you can choose to “reduce” — that is, subtract 1 and then divide by 3. What is the smallest positive integer one cannot reach this way?

Which I turned into R steps (while waiting for flight AF19 to Paris)

  while((!(x-1)%%3)&((x-1)%%6)){
oor[2*x]TRUE
oor[x<-(x-1)%/%3]=TRUE}


but running an exhaustive search till 10⁸ did not spot any missing integer… Maybe an April fool joke (as the quick riddle was asking for the simplest representation of (x-a)(x-b)…(x-z)…!)