## sufficient statistics for machine learning

Posted in Books, Running, Statistics, Travel with tags , , , , , on April 26, 2022 by xi'an

By chance, I came across this ICML¹⁹ paper of Milan Cvitkovic and nther Koliander, Minimal Achievable Sufficient Statistic Learning on a form of sufficiency for machine learning. The paper starts with “our” standard notion of sufficiency albeit in a predictive sense, namely that Z=T(X) is sufficient for predicting Y if the conditional distribution of Y given Z is the same as the conditional distribution of Y given X. It also acknowledges that minimal sufficiency may be out of reach. However, and without pursuing this question into the depths of said paper, I am surprised that any type of sufficiency can be achieved there since the model stands outside exponential families… In accordance with the Darmois-Pitman-Koopman lemma. Obviously, this is not a sufficiency notion in the statistical sense, since there is no likelihood (albeit there are parameters involved in the deep learning network). And Y is a discrete variate, which means that

$\mathbb P(Y=1|x),\ \mathbb P(Y=2|x),\ldots$

is a sufficient “statistic” for a fixed conditional, but I am lost at how the solution proposed in the paper, could be minimal when the dimension and structure of T(x) are chosen from the start. A very different notion, for sure!

## on completeness

Posted in Books, Kids, Statistics with tags , , , , , , on November 19, 2020 by xi'an

Another X validated question that proved a bit of a challenge, enough for my returning to its resolution on consecutive days. The question was about the completeness of the natural sufficient statistic associated with a sample from the shifted exponential distribution

$f(x;\theta) = \frac{1}{\theta^2}\exp\{-\theta^{-2}(x-\theta)\}\mathbb{I}_{x>\theta}$

[weirdly called negative exponential in the question] meaning the (minimal) sufficient statistic is made of the first order statistic and of the sample sum (or average), or equivalently

$T=(X_{(1)},\sum_{i=2}^n \{X_{(i)}-X_{(1)}\})$

Finding the joint distribution of T is rather straightforward as the first component is a drifted Exponential again and the second a Gamma variate with n-2 degrees of freedom and the scale θ². (Devroye’s Bible can be invoked since the Gamma distribution follows from his section on Exponential spacings, p.211.) While the derivation of a function with constant expectation is straightforward for the alternate exponential distribution

$f(x;\theta) = \frac{1}{\theta}\exp\{-\theta^{-1}(x-\theta)\}\mathbb{I}_{x>\theta}$

since the ratio of the components of T has a fixed distribution, it proved harder for the current case as I was seeking a parameter free transform. When attempting to explain the difficulty on my office board, I realised I was seeking the wrong property since an expectation was enough. Removing the dependence on θ was simpler and led to

$\mathbb E_\theta\left[\frac{X_{(1)}}{Y}-\frac{\Gamma(n-2)}{\Gamma(n-3/2)}Y^\frac{-1}{2}\right]=\frac{\Gamma(n-2)}{n\Gamma(n-1)}$

but one version of a transform with fixed expectation. This also led me to wonder at the range of possible functions of θ one could use as scale and still retrieve incompleteness of T. Any power of θ should work but what about exp(θ²) or sin²(θ³), i.e. functions for which there exists no unbiased estimator..?

## arbitrary non-constant function [nonsensical]

Posted in Statistics with tags , , , , , , , , , , , on November 6, 2020 by xi'an