## latest math stats exam

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , on January 28, 2023 by xi'an

As I finished grading our undergrad math stats exam (in Paris Dauphine) over the weekend, which was very straightforward this year, the more because most questions had already been asked on weekly quizzes or during practicals, some answers stroke me as atypical (but ChatGPT is not to blame!). For instance, in question 1, (c) received a fair share of wrong eliminations as g not being necessarily bounded. Rather than being contradicted by (b) being false. (ChatGPT managed to solve that question, except for the L² convergence!)

Question 2 was much less successful than we expected, most failures due to a catastrophic change of parameterisation for computing the mgf that could have been ignored given this is a Bernoulli model, right?! Although the students wasted quite a while computing the Fisher information for the Binomial distribution in Question 3… (ChatGPT managed to solve that question!)

Question 4 was intentionally confusing and while most (of those who dealt with the R questions) spotted the opposition between sample and distribution, hence picking (f), a few fell into the trap (d).

Question 7 was also surprisingly incompletely covered by a significant fraction of the students, as they missed the sufficiency in (c). (ChatGPT did not manage to solve that question, starting with the inverted statement that “a minimal sufficient statistic is a sufficient statistic that is not a function of any other sufficient statistic”…)

And Question 8 was rarely complete, even though many recalled Basu’s theorem for (a) [more rarely (d)] and flunked (c). A large chunk of them argued that the ancilarity of statistics in (a) and (d) made them [distributionally] independent of μ, therefore [probabilistically] of the empirical mean! (Again flunked by ChatGPT, confusing completeness and sufficiency.)

## sufficient statistics for machine learning

Posted in Books, Running, Statistics, Travel with tags , , , , , on April 26, 2022 by xi'an

By chance, I came across this ICML¹⁹ paper of Milan Cvitkovic and nther Koliander, Minimal Achievable Sufficient Statistic Learning on a form of sufficiency for machine learning. The paper starts with “our” standard notion of sufficiency albeit in a predictive sense, namely that Z=T(X) is sufficient for predicting Y if the conditional distribution of Y given Z is the same as the conditional distribution of Y given X. It also acknowledges that minimal sufficiency may be out of reach. However, and without pursuing this question into the depths of said paper, I am surprised that any type of sufficiency can be achieved there since the model stands outside exponential families… In accordance with the Darmois-Pitman-Koopman lemma. Obviously, this is not a sufficiency notion in the statistical sense, since there is no likelihood (albeit there are parameters involved in the deep learning network). And Y is a discrete variate, which means that

$\mathbb P(Y=1|x),\ \mathbb P(Y=2|x),\ldots$

is a sufficient “statistic” for a fixed conditional, but I am lost at how the solution proposed in the paper, could be minimal when the dimension and structure of T(x) are chosen from the start. A very different notion, for sure!

## on completeness

Posted in Books, Kids, Statistics with tags , , , , , , on November 19, 2020 by xi'an

Another X validated question that proved a bit of a challenge, enough for my returning to its resolution on consecutive days. The question was about the completeness of the natural sufficient statistic associated with a sample from the shifted exponential distribution

$f(x;\theta) = \frac{1}{\theta^2}\exp\{-\theta^{-2}(x-\theta)\}\mathbb{I}_{x>\theta}$

[weirdly called negative exponential in the question] meaning the (minimal) sufficient statistic is made of the first order statistic and of the sample sum (or average), or equivalently

$T=(X_{(1)},\sum_{i=2}^n \{X_{(i)}-X_{(1)}\})$

Finding the joint distribution of T is rather straightforward as the first component is a drifted Exponential again and the second a Gamma variate with n-2 degrees of freedom and the scale θ². (Devroye’s Bible can be invoked since the Gamma distribution follows from his section on Exponential spacings, p.211.) While the derivation of a function with constant expectation is straightforward for the alternate exponential distribution

$f(x;\theta) = \frac{1}{\theta}\exp\{-\theta^{-1}(x-\theta)\}\mathbb{I}_{x>\theta}$

since the ratio of the components of T has a fixed distribution, it proved harder for the current case as I was seeking a parameter free transform. When attempting to explain the difficulty on my office board, I realised I was seeking the wrong property since an expectation was enough. Removing the dependence on θ was simpler and led to

$\mathbb E_\theta\left[\frac{X_{(1)}}{Y}-\frac{\Gamma(n-2)}{\Gamma(n-3/2)}Y^\frac{-1}{2}\right]=\frac{\Gamma(n-2)}{n\Gamma(n-1)}$

but one version of a transform with fixed expectation. This also led me to wonder at the range of possible functions of θ one could use as scale and still retrieve incompleteness of T. Any power of θ should work but what about exp(θ²) or sin²(θ³), i.e. functions for which there exists no unbiased estimator..?

## arbitrary non-constant function [nonsensical]

Posted in Statistics with tags , , , , , , , , , , , on November 6, 2020 by xi'an