(I am obliged to write ‘pseudo-Bayesian’ because maximizing the evidence to choose the hyper-parameters is only sorta Bayesian — and in my own practice I prefer to integrate out.)

Analysts who are ‘regularizers’ have no particular desire to conform to a pseudo-Bayesian approach. So regularizers can penalize max log likelihood in any way that they think is appropriate; for example they can use 2 * flexibility as their complexity penalty.

(This reminds me of the situation in gambling. The Kelly criterion is optimal for long-term growth of your fund, but it is also quite conservative, and so some gamblers will use 2 * Kelly, or even more. Of course they are doing this because their gambling is about more than just increasing the size of the fund.)

My extreme subjectivism enables me to accept as someone’s defense of their modelling choices — “it feels right to me”. But, as I write in ‘Confidence in Risk Assessments’ (doi:10.1111/rssa.12445) in the ‘bazaar of experts’ clients and their auditors might require a little more than this, in order to select their expert. So I hope that complexity = flexibility might catch on a little: it is an attractive choice where there is no compelling reason to select a particular complexity penalty, because it has ‘cross tribe’ appeal.

Also, flexibility is asymptotically BIC in the Linear Model, although, as we say in the paper, it is better to estimate the evidence directly, than to approximate it with a BIC penalty, which misses the distinction between the nominal number of parameters and the effective number of parameters.

]]>Oups, merci!

]]>