Archive for numerics

logic (not logistic!) regression

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on February 12, 2020 by xi'an

A Bayesian Analysis paper by Aliaksandr Hubin, Geir Storvik, and Florian Frommlet on Bayesian logic regression was open for discussion. Here are some hasty notes I made during our group discussion in Paris Dauphine (and later turned into a discussion submitted to Bayesian Analysis):

“Originally logic regression was introduced together with likelihood based model selection, where simulated annealing served as a strategy to obtain one “best” model.”

Indeed, logic regression is not to be confused with logistic regression! Rejection of a true model in Bayesian model choice leads to Bayesian model choice and… apparently to Bayesian logic regression. The central object of interest is a generalised linear model based on a vector of binary covariates and using some if not all possible logical combinations (trees) of said covariates (leaves). The GLM is further using rather standard indicators to signify whether or not some trees are included in the regression (and hence the model). The prior modelling on the model indices sounds rather simple (simplistic?!) in that it is only function of the number of active trees, leading to an automated penalisation of larger trees and not accounting for a possible specificity of some covariates. For instance when dealing with imbalanced covariates (much more 1 than 0, say).

A first question is thus how much of a novel model this is when compared with say an analysis of variance since all covariates are dummy variables. Culling the number of trees away from the exponential of exponential number of possible covariates remains obscure but, without it, the model is nothing but variable selection in GLMs, except for “enjoying” a massive number of variables. Note that there could be a connection with variable length Markov chain models but it is not exploited there.

“…using Jeffrey’s prior for model selection has been widely criticized for not being consistent once the true model coincides with the null model.”

A second point that strongly puzzles me in the paper is its loose handling of improper priors. It is well-known that improper priors are at worst fishy in model choice settings and at best avoided altogether, to wit the Lindley-Jeffreys paradox and friends. Not only does the paper adopts the notion of a same, improper, prior on the GLM scale parameter, which is a position adopted in some of the Bayesian literature, but it also seems to be using an improper prior on each set of parameters (further undifferentiated between models). Because the priors operate on different (sub)sets of parameters, I think this jeopardises the later discourse on the posterior probabilities of the different models since they are not meaningful from a probabilistic viewpoint, with no joint distribution as a reference, neither marginal density. In some cases, p(y|M) may become infinite. Referring to a “simple Jeffrey’s” prior in this setting is therefore anything but simple as Jeffreys (1939) himself shied away from using improper priors on the parameter of interest. I find it surprising that this fundamental and well-known difficulty with improper priors in hypothesis testing is not even alluded to in the paper. Its core setting thus seems to be flawed. Now, the numerical comparison between Jeffrey’s [sic] prior and a regular g-prior exhibits close proximity and I thus wonder at the reason. Could it be that the culling and selection processes end up having the same number of variables and thus eliminate the impact of the prior? Or is it due to the recourse to a Laplace approximation of the marginal likelihood that completely escapes the lack of definition of the said marginal? Computing the normalising constant and repeating this computation while the algorithm is running ignores the central issue.

“…hereby, all states, including all possible models of maximum sized, will eventually be visited.”

Further, I found some confusion between principles and numerics. And as usual bemoan the acronym inflation with the appearance of a GMJMCMC! Where G stands for genetic (algorithm), MJ for mode jumping, and MCMC for…, well no surprise there! I was not aware of the mode jumping algorithm of Hubin and Storvik (2018), so cannot comment on the very starting point of the paper. A fundamental issue with Markov chains on discrete spaces is that the notion of neighbourhood becomes quite fishy and is highly dependent on the nature of the covariates. And the Markovian aspects are unclear because of the self-avoiding aspect of the algorithm. The novel algorithm is intricate and as such seems to require a superlative amount of calibration. Are all modes truly visited, really? (What are memetic algorithms?!)

Egyptian fractions [Le Monde puzzle #922]

Posted in Books, Kids, R with tags , , , , , , , on July 28, 2015 by xi'an

For its summer edition, Le Monde mathematical puzzle switched to a lighter version with immediate solution. This #922 considers Egyptian fractions which only have distinct denominators (meaning the numerator is always 1) and can be summed. This means 3/4 is represented as ½+¼. Each denominator only appears once. As I discovered when looking on line, a lot of people are fascinated with this representation and have devised different algorithms to achieve decompositions with various properties. Including Fibonacci who devised a specific algorithm called the greedy algorithm in 1202 in the Liber Abaci. In the current Le Monde edition, the questions were somewhat modest and dealt with the smallest decompositions of 2/5, 5/12, and 50/77 under some additional constraint.

Since the issue was covered in so many places, I just spent one hour or so constructing a basic solution à la Fibonacci and then tried to improve it against a length criterion. Here are my R codes (using the numbers library):

osiris=function(a,b){
#can the fraction a/b be simplified
diva=primeFactors(a)
divb=primeFactors(b)
divc=c(unique(diva),unique(divb))
while (sum(duplicated(divc))>0){
  n=divc[duplicated(divc)]
  for (i in n){a=div(a,i);b=div(b,i)}
  diva=primeFactors(a)
  divb=primeFactors(b)
  divc=c(unique(diva),unique(divb))
  }
  return(list(a=a,b=b))
}

presumably superfluous for simplifying fractions

horus=function(a,b,teth=NULL){
#simplification
anubis=osiris(a,b)
a=anubis$a;b=anubis$b
#decomposition by removing 1/b
 isis=NULL
 if (!(b %in% teth)){
   a=a-1
   isis=c(isis,b)
   teth=c(teth,b)}
 if (a>0){
#simplification
  anubis=osiris(a,b)
  bet=b;a=anubis$a;b=anubis$b
  if (bet>b){ isis=c(isis,horus(a,b,teth))}else{
  # find largest integer
    k=ceiling(b/a)
    while (k %in% teth) k=k+1
    a=k*a-b
    b=k*b
    isis=c(isis,k,horus(a,b,teth=c(teth,k)))
    }}
 return(isis)}

which produces a Fibonacci solution (with the additional inclusion of the original denominator) and

nut=20
seth=function(a,b,isis=NULL){
#simplification
anubis=osiris(a,b)
a=anubis$a;b=anubis$b
if ((a==1)&(!(b %in% isis))){isis=c(isis,b)}else{
 ra=hapy=ceiling(b/a)
 if (max(a,b)<1e5) hapy=horus(a,b,teth=isis)
 k=unique(c(hapy,ceiling(ra/runif(nut,min=.1,max=1))))
 propa=propb=propc=propd=rep(NaN,le=length((k %in% isis)))
 bastet=1
 for (i in k[!(k %in% isis)]){
   propa[bastet]=i*a-b
   propb[bastet]=i*b
   propc[bastet]=i
   propd[bastet]=length(horus(i*a-b,i*b,teth=c(isis,i)))
   bastet=bastet+1
   }
 k=propc[order(propd)[1]]
 isis=seth(k*a-b,k*b,isis=c(isis,k))
 }
return(isis)}

which compares solutions against their lengths. When calling those functions for the three fractions above the solutions are

> seth(2,5)
[1] 15 3
> seth(5,12)
[1] 12  3
> seth(50,77)
[1]   2 154   7

with no pretension whatsoever to return anything optimal (and with some like crashes when the magnitude of the entries grows, try for instance 5/121). For this latest counter-example, the alternative horus works quite superbly:

> horus(5,121)
[1] 121 31 3751 1876 7036876