Archive for Observatoire de Paris

information maximising neural networks summaries

Posted in pictures, Statistics with tags , , , , , , , , on February 6, 2019 by xi'an

After missing the blood moon eclipse last night, I had a meeting today at the Paris observatory (IAP), where we discussed an ABC proposal made by Tom Charnock, Guilhem Lavaux, and Benjamin Wandelt from this institute.

“We introduce a simulation-based machine learning technique that trains artificial neural networks to find non-linear functionals of data that maximise Fisher information : information maximising neural networks.” T. Charnock et al., 2018
The paper is centred on the determination of “optimal” summary statistics. With the goal of finding “transformation which maps the data to compressed summaries whilst conserving Fisher information [of the original data]”. Which sounds like looking for an efficient summary and hence impossible in non-exponential cases. As seen from the description in (2.1), the assumed distribution of the summary is Normal, with mean μ(θ) and covariance matrix C(θ) that are implicit transforms of the parameter θ. In that respect, the approach looks similar to the synthetic likelihood proposal of Wood (2010). From which an unusual form of Fisher information can be derived, as μ(θ)’C(θ)⁻¹μ(θ)… A neural net is trained to optimise this information criterion at a given (so-called fiducial) value of θ, in terms of a set of summaries of the same dimension as the data. Which means the information contained in the whole data (likelihood) is not necessarily recovered, linking with this comment from Edward Ionides (in a set of lectures at Wharton).
“Even summary statistics derived by careful scientific or statistical reasoning have been found surprisingly uninformative compared to the whole data likelihood in both scientific investigations (Shrestha et al., 2011) and simulation experiments (Fasiolo et al., 2016)” E. Ionides, slides, 2017
The maximal Fisher information obtained in this manner is then used in a subsequent ABC step as the natural metric for the distance between the observed and simulated data. (Begging the question as to why being maximal is necessarily optimal.) Another question is about the choice of the fiducial parameter, which choice should be tested by for instance iterating the algorithm a few steps. But having to run simulations for a single value of the parameter is certainly a great selling point!

Bayesian astrostats under Laplace’s gaze

Posted in Books, Kids, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , on October 11, 2016 by xi'an

This afternoon, I was part of a jury of an astrostatistics thesis, where the astronomy part was about binary objects in the Solar System, and the statistics part about detecting patterns in those objects, unsurprisingly. The first part was highly classical using several non-parametric tests like Kolmogorov-Smirnov to test whether those binary objects were different from single objects. While the p-values were very tiny, I felt these values were over-interpreted in the thesis, because the sample size of N=30 leads to some scepticism about numerical quantities like 0.0008. While I do not want to sound pushing for Bayesian solutions in every setting, this case is a good illustration of the nefarious power of p-values, which are almost always taken at face value, i.e., where 0.008 is understood in terms of the null hypothesis and not in terms of the observed realisation of the p-value. Even within a frequentist framework, the distribution of this p-value should be evaluated or estimated one way or another, as there is no reason to believe it is anywhere near a Uniform(0,1) distribution.The second part of the thesis was about the estimation of some parameters of the laws of the orbits of those dual objects and the point of interest for me was the purely mechanical construction of a likelihood function that was an exponential transform of a sum of residuals, made of squared differences between the observations and their expectations. Or a power of such differences. This was called the “statistical model” in the thesis and I presume in part of the astrostats literature. This reminded me of the first meeting I had with my colleagues from Besançon, where they could not use such mechanical versions because of intractable expectations and used instead simulations from their physical model, literally reinventing ABC. This resolution had the same feeling, closer to indirect inference than regular inference, although it took me half the defence to realise it.

The defence actually took part in the beautiful historical Perrault’s building of Observatoire de Paris, in downtown Paris, where Cassini, Arago and Le Verrier once ruled!  In the council room under paintings of major French astronomers, including Laplace himself, looking quite smug in his academician costume. The building is built around the Paris Zero Meridian (which got dethroned in 1911 by the Greenwich Zero Meridian, which I contemplated as a kid since my childhood church had the Greenwich drawn on the nave stones). The customary “pot” after the thesis and its validation by the jury was in the less historical cafeteria of the Observatoire, but it included a jazz big band, which made this thesis defence quite unique in many ways!

À l’Observatoire de Paris

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on July 5, 2016 by xi'an

This Monday, I made a most pleasant trip to the Observatoire de Paris, which campus is located in Meudon and no longer in Paris. (There also is an Observatoire de Paris campus in downtown Paris, created in 1667, where no observation can take place.) Most pleasant for many reasons. First, I was to meet with Frédéric Arenou and two visiting astrostatisticians from Kolkata, India, whom I met in Bangalore two years ago. Working on a neat if no simple issue of inverted mean estimation. Second, because the place is beautiful, with great views of Paris (since the Observatoire is on a ridge), and with a classical-looking building actually made of recycled castle parts after the Franco-Prussian war of 1870, and because Frédéric gave us a grand tour of place. And third, because I went there by bike through the Forêt de Meudon which I did not suspect was that close to home and which I crossed on downhill muddy trails that made me feel far away from Paris! And giving me the opportunity to test the mettle of a new mountain bike elsewhere than again Parisian SUVs. (This was the first day of a relatively intense biking week, which really helped with the half-marathon training: San Francisco ½ is in less than a month!!! And I am in wave 2!)

Gaia

Posted in Statistics, University life with tags , , , , , , , , on September 19, 2012 by xi'an

Today, I attended a meeting at the Paris observatory about the incoming launch of the Gaia satellite and the associated data (mega-)challenges. To borrow from the webpage, “To create the largest and most precise three dimensional chart of our Galaxy by providing unprecedented positional and radial velocity measurements for about one billion stars in our Galaxy and throughout the Local Group.” The amount of data that will be produced by this satellite is staggering: Gaia will take pictures of roughly 1Giga pixels that will be processed both on-board and on Earth, transmitting over five years a pentabyte of data that need to be processed fairly efficiently to be at all useful! The European consortium operating this satellite has planned for specific tasks dedicated to data handling and processing, which is a fabulous opportunity for would-be astrostatisticians! (Unsurprisingly, at least half of the tasks are statistics related, either at the noise reduction stage or at the estimation stage.) Another amazing feature of the project is that it will result in open data, the outcome of the observations being open to everyone for analyse… I am clearly looking forward the next meeting to understand better the structure of the data and the challenges simulation methods could help to solve!