Archive for linguistics

neural summaries

Posted in Statistics, University life with tags , , , , , , on September 27, 2019 by xi'an

Nature snapshot [Volume 539 Number 7627]

Posted in Books, Statistics, University life with tags , , , , , , , , , , on November 15, 2016 by xi'an

A number of entries of interest [to me] in that Nature issue: from the Capuchin monkeys that break stones in a way that resembles early hominins biface tools, to the persistent association between some sounds and some meanings across numerous languages, to the use of infected mosquitoes in South America to fight Zika, to the call for more maths in psychiatry by the NIMH director, where since prevision is mentioned I presumed stats is included, to the potentially earthshaking green power revolution in Africa, to the reconstruction of the first HIV strains in North America, along with the deconstruction of the “Patient 0” myth, helped by Bayesian phylogenetic analyses, to a cover of the Open Syllabus Project, with Monte Carlo Statistical Methods arriving first [in the Monte Carlo list]….

“Observations should not converge on one model but aim to find anomalies that carry clues about the nature of dark matter, dark energy or initial conditions of the Universe. Further observations should be motivated by testing unconventional interpretations of those anomalies (such as exotic forms of dark matter or modified theories of gravity). Vast data sets may contain evidence for unusual behaviour that was unanticipated when the projects were conceived.” Avi Loeb

One editorial particularly drew my attention, Good data are not enough, by the astronomer Avi Loeb. as illustrated  by the quote above, Loeb objects to data being interpreted and even to data being collected towards the assessment of the standard model. While I agree that this model contains a lot of fudge factors like dark matter and dark energy, which apparently constitutes most of the available matter, the discussion is quite curious, in that interpreting data according to alternative theories sounds impossible and certainly beyond the reach of most PhD students [as Loeb criticises the analysis of some data in a recent thesis he evaluated].

“modern cosmology is augmented by unsubstantiated, mathematically sophisticated ideas — of the multiverse, anthropic reasoning and string theory.

The author argues to always allow for alternative interpretations of the data, which sounds fine at a primary level but again calls for the conception of such alternative models. When discrepancies are found between the standard model and the data, they can be due to errors in the measurement itself, in the measurement model, or in the theoretical model. However, they may be impossible to analyse outside the model, in the neutral way called and wished by Loeb. Designing neutral experiments sounds even less meaningful. Which is why I am fairly taken aback by the call to “a research frontier [that] should maintain at least two ways of interpreting data so that new experiments will aim to select the correct one”! Why two and not more?! And which ones?! I am not aware of fully developed alternative theories and cannot see how experiments designed under one model could produce indications about a new and incomplete model.

“Such simple, off-the-shelf remedies could help us to avoid the scientific fate of the otherwise admirable Mayan civilization.”

Hence I am bemused by the whole exercise, which deepest arguments seem to be a paper written by the author last year and an interdisciplinary centre on black holes also launched recently by the same author.

Nature highlights

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , , , , on October 16, 2016 by xi'an

Among several interesting (general public) entries and the fascinating article reconstituting the death of Lucy by a fall from a tree, I spotted in the current Sept. 22 issue of Nature two short summaries involving statistical significance, one in linguistics about repeated (and significant) links between some sounds and some concepts (like ‘n’ and ‘nose’) shared between independent languages, another about the (significant) discovery of a π meson and a K meson. The first anonymous editorial, entitled “Algorithm and blues“, was rather gloomy about the impact of proprietary algorithms on our daily life and on our democracies (or what is left of them), like the reliance on such algorithms to grant loan or determining the length of a sentence (based on the estimated probability of re-offending). The article called for more accountability of such tools, from going completely open-source to allowing for some form of strong auditing. This reminded me of the current (regional) debate about the algorithm allocating Greater Paris high school students to local universities and colleges based on their grades, wishes, and available positions. The apparent randomness and arbitrariness of those allocations prompted many (parents) to complain about the algorithm and ask for its move to the open. (Besides the pun in the title, the paper also contained a line about “affirmative algorithmic action”!) There was also a perfectly irrelevant tribune from a representative of the Church of England about its desire to give a higher profile to science in the/their church. Whatever. And I also was bemused by a news article on the difficulty to build a genetic map of Australia Aboriginals due to cultural reticence of Aboriginals to the use of body parts from their communities in genetic research. While I understand and agree with the concept of data privacy, so that to restrain to expose personal information, it is much less clear [to me] why data collected a century ago should come under such protections if it does not create a risk of exposing living individuals. It reminded me of this earlier Nature news article about North-America Aboriginals claiming right to a 8,000 year old skeleton. On a more positive side, this news part also mentioned the first catalogue produced by the Gaia European Space Agency project, from the publication of more than a billion star positions to the open access nature of the database, in that the Gaia team had hardly any prior access to such wealth of data. A special issue part of the journal was dedicated to the impact of social inequalities in the production of (future) scientists, but this sounds rather shallow, at least at the level of the few pages produced on the topic and it did not mention a comparison with other areas of society, where they are also most obviously at work!

new kid on the blog

Posted in Kids, Statistics, University life with tags , , , , , , on January 27, 2016 by xi'an

[I first thought this title was highly original but a google search showed me wrong…] This short post to point out to the new blog started by Ingmar Schuster on computational statistics and linguistics. Which, so far, keeps strictly to the discussion of recent research papers (rather than ratiocinating about all kinds of tangential topics like a certain ‘Og…) Some of which we may discuss in parallel. And some not. So keep posted! Ingmar came to Paris-Dauphine for a doctoral visit last Winter and is back as a postdoc (supported by the Fondation des Sciences Mathématiques de Paris) since last Fall. Working with me and Nicolas, among others.


The synoptic problem and statistics [book review]

Posted in Books, R, Statistics, University life, Wines with tags , , , , , , , , , , , , on March 20, 2015 by xi'an

A book that came to me for review in CHANCE and that came completely unannounced is Andris Abakuks’ The Synoptic Problem and Statistics.  “Unannounced” in that I had not heard so far of the synoptic problem. This problem is one of ordering and connecting the gospels in the New Testament, more precisely the “synoptic” gospels attributed to Mark, Matthew and Luke, since the fourth canonical gospel of John is considered by experts to be posterior to those three. By considering overlaps between those texts, some statistical inference can be conducted and the book covers (some of?) those statistical analyses for different orderings of ancestry in authorship. My overall reaction after a quick perusal of the book over breakfast (sharing bread and fish, of course!) was to wonder why there was no mention made of a more global if potentially impossible approach via a phylogeny tree considering the three (or more) gospels as current observations and tracing their unknown ancestry back just as in population genetics. Not because ABC could then be brought into the picture. Rather because it sounds to me (and to my complete lack of expertise in this field!) more realistic to postulate that those gospels were not written by a single person. Or at a single period in time. But rather that they evolve like genetic mutations across copies and transmission until they got a sort of official status.

“Given the notorious intractability of the synoptic problem and the number of different models that are still being advocated, none of them without its deficiencies in explaining the relationships between the synoptic gospels, it should not be surprising that we are unable to come up with more definitive conclusions.” (p.181)

The book by Abakuks goes instead through several modelling directions, from logistic regression using variable length Markov chains [to predict agreement between two of the three texts by regressing on earlier agreement] to hidden Markov models [representing, e.g., Matthew’s use of Mark], to various independence tests on contingency tables, sometimes bringing into the model an extra source denoted by Q. Including some R code for hidden Markov models. Once again, from my outsider viewpoint, this fragmented approach to the problem sounds problematic and inconclusive. And rather verbose in extensive discussions of descriptive statistics. Not that I was expecting a sudden Monty Python-like ray of light and booming voice to disclose the truth! Or that I crave for more p-values (some may be found hiding within the book). But I still wonder about the phylogeny… Especially since phylogenies are used in text authentication as pointed out to me by Robin Ryder for Chauncer’s Canterbury Tales.