Archive for R

Le Monde puzzle [#1127]

Posted in Books, Kids, R, Statistics with tags , , , , on January 17, 2020 by xi'an

A permutation challenge as Le weekly Monde current mathematical puzzle:

When considering all games between 20 teams, of which 3 games have not yet been played, wins bring 3 points, losses 0 points, and draws 1 point (each). If the sum of all points over all teams and all games is 516, was is the largest possible number of teams with no draw in every game they played?

The run of a brute force R simulation of 187 purely random games did not produce enough acceptable tables in a reasonable time. So I instead considered that a sum of 516 over 187 games means solving 3a+2b=516 and a+b=187, leading to 142 3’s to allocate and 45 1’s. Meaning for instance this realisation of an acceptable table of game results

games=matrix(1,20,20);diag(games)=0
while(sum(games*t(games))!=374){
  games=matrix(1,20,20);diag(games)=0
  games[sample((1:20^2)[games==1],3)]=0}
games=games*t(games)
games[lower.tri(games)&games]=games[lower.tri(games)&games]*
    sample(c(rep(1,45),rep(3,142)))* #1's and 3'
    (1-2*(runif(20*19/2-3)<.5)) #sign
games[upper.tri(games)]=-games[lower.tri(games)]
games[games==-3]=0;games=abs(games)

Running 10⁶ random realisations of such matrices with no constraint whatsoever provided a solution with] 915,524 tables with no no-draws, 81,851 tables with 19 teams with some draws, 2592 tables with 18 with some draws and 3 tables with 17 with some draws. However, given that 10*9=90 it seems to me that the maximum number should be 10 by allocating all 90 draw points to the same 10 teams, and 143 3’s at random in the remaining games, and I reran a simulated annealing version (what else?!), reaching a maximum 6 teams with no draws. Nowhere near 10, though!

an elegant sampler

Posted in Books, Kids, R, University life with tags , , , , , , , on January 15, 2020 by xi'an

Following an X validated question on how to simulate a multinomial with fixed average, W. Huber produced a highly elegant and efficient resolution with the compact R code

tabulate(sample.int((k-1)*n, s-n) %% n + 1, n) + 1

where k is the number of classes, n the number of draws, and s equal to n times the fixed average. The R function sample.int is an alternative to sample that seems faster. Breaking the outcome of

sample.int((k-1)*n, s-n)

as nonzero positions in an n x (k-1) matrix and adding a adding a row of n 1’s leads to a simulation of integers between 1 and k by counting the 1’s in each of the n columns, which is the meaning of the above picture. Where the colour code is added after counting the number of 1’s. Since there are s 1’s in this matrix, the sum is automatically equal to s. Since the s-n positions are chosen uniformly over the n x (k-1) locations, the outcome is uniform. The rest of the R code is a brutally efficient way to translate the idea into a function. (By comparison, I brute-forced the question by suggesting a basic Metropolis algorithm.)

Le Monde puzzle [#1120]

Posted in Books, Kids, pictures, R with tags , , , , on January 14, 2020 by xi'an

A board game as Le weekly Monde current mathematical puzzle:

11 players in a circle and 365 tokens first owned by a single player. Players with at least two tokens can either remove one token and give another one left or move two right and one left. How quickly does the game stall, how many tokens are left, and where are they?

The run of a R simulation like

od=function(i)(i-1)%%11+1
muv<-function(bob){
  if (max(bob)>1){
    i=sample(rep((1:11)[bob>1],2),1)
    dud=c(0,-2,1)
    if((runif(1)<.5)&(bob[i]>2))dud=c(2,-3,1)
    bob[c(od(i+10),i,od(i+1))]=bob[c(od(i+10),i,od(i+1))]+dud
  }
  bob}

always provides a solution

> bob
 [1] 1 0 1 1 0 1 1 0 1 0 0

with six ones at these locations. However the time it takes to reach this frozen configuration varies, depending on the sequence of random choices.

 

Metropolis in 95 characters

Posted in pictures, R, Statistics, Travel with tags , , , , , , , , on January 2, 2020 by xi'an

Here is an R function that produces a Metropolis-Hastings sample for the univariate log-target f when the later is defined outside as another function. And when using a Gaussian random walk with scale one as proposal. (Inspired from a X validated question.)

m<-function(T,y=rnorm(1))ifelse(rep(T>1,T),
  c(y*{f({z<-m(T-1)}[1])-f(y+z[1])<rexp(1)}+z[1],z),y)

The function is definitely not optimal, crashes for values of T larger than 580 (unless one modifies the stack size), and operates the most basic version of a Metropolis-Hastings algorithm. But as a codegolf challenge (on a late plane ride), this was a fun exercise.

Le Monde puzzle [#1124]

Posted in Books, Kids, R with tags , , , , , on December 29, 2019 by xi'an

A prime number challenge [or rather two!] as Le weekly Monde current mathematical puzzle:

When considering the first two integers, 1 and 2, their sum is 3, a prime number. For the first four integers, 1,2,3,4, it is again possible to sum them pairwise to obtain two prime numbers, eg 3 and 7. Up to which limit is this operation feasible? And how many primes below 30,000 can write as n^p+p^n?

The run of a brute force R simulation like

max(apply(apply(b<-replicate(1e6,(1:n)+sample(n)),2,is_prime)[,b[1,]>2],2,prod))

provides a solution for the first question until n=14 when it stops. A direct explanation is that the number of prime numbers grows too slowly for all sums to be prime. And the second question gets solved by direct enumeration using again the is_prime R function from the primes package:

[1] 1 1
[1] 1 2
[1] 1 4
[1] 2 3
[1] 3 4

postdoc in Bayesian machine learning in Berlin [reposted]

Posted in R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 24, 2019 by xi'an

The working group of Statistics at Humboldt University of Berlin invites applications for one Postdoctoral research fellow (full-time employment, 3 years with extension possible) to contribute to the research on mathematical and statistical aspects of (Bayesian) learning approaches. The research positions are associated with the Emmy Noether group Regression Models beyond the Mean – A Bayesian Approach to Machine Learning and working group of Applied Statistics at the School of Business and Economics at Humboldt-Universität Berlin. Opportunities for own scientific qualification (PhD)/career development are provided, see an overview and further links. The positions are to be filled at the earliest possible date and funded by the German Research Foundation (DFG) within the Emmy Noether programme.

Requirements:
– an outstanding PhD in Statistics, Mathematics, or related field with specialisation in Statistics, Data Science or Mathematics;
– a strong background in at least one of the following fields: mathematical statistics, computational methods, Bayesian statistics, statistical learning, advanced regression modelling;
– a thorough mathematical understanding.
– substantial experience in scientific programming with Matlab, Python, C/C++, R or similar;
– strong interest in developing novel statistical methodology and its applications in various fields such as economics or natural and life sciences;
– a very good communication skills and team experience, proficiency of the written and spoken English language (German is not obligatory).

Opportunities:
We offer the unique environment of young researchers and leading international experts in the fields. The vibrant international network includes established collaborations in Singapore and Australia. The positions offer potential to closely work with several applied sciences. Information about the research profile of the research group and further contact details can be found here. The positions are paid according to the Civil Service rates of the German States “TV-L”, E13 (if suitably qualified).

Applications should include:
– a CV with list of publications
– a motivational statement (at most one page) explaining the applicant’s interest in the announced position as well as their relevant skills and experience
– copies of degrees/university transcripts
– names and email addresses of at least two professors that may provide letters of recommendation directly to the hiring committee Applications should be sent as a single PDF file to: Prof. Dr. Nadja Klein (nadja.klein[at]hu-berlin.de), whom you may also contact for questions concerning this job post. Please indicate “Research Position Emmy Noether”.

Application deadline: 31st of January 2020

HU is seeking to increase the proportion of women in research and teaching, and specifically encourages qualified female scholars to apply. Severely disabled applicants with equivalent qualifications will be given preferential consideration. People with an immigration background are specifically encouraged to apply. Since we will not return your documents, please submit copies in the application only.

an arithmetic mean identity

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 19, 2019 by xi'an

A 2017 paper by Ana Pajor published in Bayesian Analysis addresses my favourite problem [of computing the marginal likelihood] and which I discussed on the ‘Og, linking with another paper by Lenk published in 2012 in JCGS. That I already discussed here last year. Lenk’s (2009) paper is actually using a technique related to the harmonic mean correction based on HPD regions Darren Wraith and myself proposed at MaxEnt 2009. And which Jean-Michel and I presented at Frontiers of statistical decision making and Bayesian analysis in 2010. As I had only vague memories about the arithmetic mean version, we discussed the paper together with graduate students in Paris Dauphine.

The arithmetic mean solution, representing the marginal likelihood as the prior average of the likelihood, is a well-known approach used as well as the basis for nested sampling. With the improvement consisting in restricting the simulation to a set Ð with sufficiently high posterior probability. I am quite uneasy about P(Ð|y) estimated by 1 as the shape of the set containing all posterior simulations is completely arbitrary, parameterisation dependent, and very random since based on the extremes of this posterior sample. Plus, the set Ð converges to the entire parameter space with the number of posterior simulations. An alternative that we advocated in our earlier paper is to take Ð as the HPD region or a variational Bayes version . But the central issue with the HPD regions is how to construct these from an MCMC output and how to compute both P(Ð) and P(Ð|y). It does not seem like a good idea to set P(Ð|x) to the intended α level for the HPD coverage. Using a non-parametric version for estimating Ð could be in the end the only reasonable solution.

As a test, I reran the example of a conjugate normal model used in the paper, based on (exact) simulations from both the prior and  the posterior, and obtained approximations that were all close from the true marginal. With Chib’s being exact in that case (of course!), and an arithmetic mean surprisingly close without an importance correction:

> print(c(hame,chme,came,chib))
[1] -107.6821 -106.5968 -115.5950 -115.3610

Both harmonic versions are of the right order but not trustworthy, the truncation to such a set Ð as the one chosen in this paper having little impact.