deduplication and population size estimation [discussion opened]

Posted in Books, pictures, Running, Statistics, University life with tags , , , , on March 27, 2020 by xi'an

A call (worth disseminating) for discussions on the paper “A Unified Framework for De-Duplication and Population Size Estimation” by [my friends] Andrea Tancredi, Rebecca Steorts, and Brunero Liseo, to appear on the June 2020 issue of Bayesian Analysis. The deadline is 24 April.

Data de-duplication is the process of detecting records in one or more datasets which refer to the same entity. In this paper we tackle the de-duplication process via a latent entity model, where the observed data are perturbed versions of a set of key variables drawn from a finite population of N different entities. The main novelty of our approach is to consider the population size N as an unknown model parameter. As a result, a salient feature of the proposed method is the capability of the model to account for the de-duplication uncertainty in the population size estimation. As by-products of our approach we illustrate the relationships between de-duplication problems and capture-recapture models and we obtain a more adequate prior distribution on the linkage structure. Moreover we propose a novel simulation algorithm for the posterior distribution of the matching configuration based on the marginalization of the key variables at population level. We apply our method to two synthetic data sets comprising German names. In addition we illustrate a real data application, where we match records from two lists which report information about people killed in the recent Syrian conflict.

ABC in Svalbard [news #1]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on March 23, 2020 by xi'an

We [Julien and myself] are quite pleased to announce that

  • the scientific committee for the workshop has been gathered
  • the webpage for the workshop is now on-line (with a wonderful walrus picture whose author we alas cannot identify)
  • the workshop is now endorsed by both IMS and ISBA, which will handle registration (to open soon)
  • the reservation of hotel rooms will be handled by Hurtigruten Svalbard through the above webpage (this is important as we already paid deposit for a certain number of rooms)
  • we are definitely seeking both sponsors and organisers of mirror workshops in more populated locations

As an item of trivia, let me recall that Svalbard stands for the archipelago, while Spitsbergen is the name of the main island, where Longyearbyen is located. (In Icelandic, Svalbarði means cold rim or cold coast.)

to bike or not to bike

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , on March 22, 2020 by xi'an

A recent debate between the candidates to the Paris mayorship, including a former Health minister and physician, led to arguments as to whether or not biking in Paris is healthy. Obviously, it is beneficial for the community, but the question is rather about the personal benefits vs dangers of riding a bike daily to work. Extra physical activity on the one hand, exposition to air pollution and accidents on the other hand. With an accident rate that increased during the recent strikes, but at a lesser rate (153%) than the number of cyclists in the streets of Paris (260%). While I do not find the air particularly stinky or unpleasant on my daily 25km, except in the frequent jams between Porte d’Auteuil and Porte de la Muette, and while I haven’t noticed a direct impact on my breathing or general shape, I try to avoid rush hours, especially on the way back home with a good climb near Porte de Versailles (the more on days when it is jammed solid with delivery trucks for the nearby exhibition centre). As for accidents, trying to maintain constant vigilance and predicting potential fishtails is the rule, as is avoiding most bike paths as I find them much more accident-prone than main streets… (Green lights are also more dangerous than red lights, in my opinion!) Presumably, so far at least, benefits outweight the costs!

Posted in Running, University life with tags , , , , , on March 14, 2020 by xi'an

Glasgow [The papers of Tony Veitch]

Posted in Books, Kids, Mountains, pictures, Running, Travel with tags , , , , , , , , , , , , , on March 3, 2020 by xi'an

[I read the second volume of McIlvanney’s Laidlaw Trilogy, The papers of Tony Veitch, with the same glee as the first one. And with some nostalgia at the yearly trips to Glasgow I made over the years, albeit a decade after the book was published. Some passages were too good to be missed!]“Standing so high, Laidlaw felt the bleakness of summer on his face and understood a small truth. Even the climate here offered no favours. Standing at a bus stop, you talked out the side of your mouth, in case your lips got chapped. Maybe that was why the West of Scotland was where people put the head on one another—it was too cold to take your hands out your pockets.”

“A small and great city, his mind answered. A city with its face against the wind. That made it grimace. But did it have to be so hard? Sometimes it felt so hard. Well, that was some wind and it had never stopped blowing. Even when this place was the second city of the British Empire, affluence had never softened it because the wealth of the few had become the poverty of the many. The many had survived, however harshly, and made the spirit of the place theirs. Having survived affluence, they could survive anything. Now that the money was tight, they hardly noticed the difference. If you had it, all you did was spend it. The money had always been tight. Tell us something we don’t know. That was Glasgow. It was a place so kind it would batter cruelty into the ground. And what circumstances kept giving it was cruelty. No wonder he loved it. It danced among its own debris. When Glasgow gave up, the world could call it a day.”

“Laidlaw had a happy image of the first man out after the nuclear holocaust being a Glaswegian. He would straighten up and look around. He would dust himself down with that flicking gesture of the hands and, once he had got the strontium off the good suit, he would look up. The palms would be open.   ‘Hey,’ he would say. ‘Gonny gi’es a wee brek here? What was that about? Ye fell oot wi’ us or what? That was a liberty. Just you behave.’     Then he would walk off with that Glaswegian walk, in which the shoulders don’t move separately but the whole torso is carried as one, as stiff as a shield. And he would be muttering to himself, ‘Must be a coupla bottles of something still intact.’”
“They were sitting in the Glasgow University Club bar (…) Laidlaw was staring at his lime-juice and soda. Harkness was taking his lager like anaesthetic. Around them the heavy buildings and empty quadrangles seemed to shut out the city, giving them the feeling of being at the entrance to a shaft sunk into the past. Certainly, the only other two people in the room were having less a conversation than a seance, though they only seemed to summon the dead in order to rekill them.
    The talk of the two university men reminded Laidlaw of why he had left university at the end of his first year, having passed his exams. He found that the forty-year-old man agreed with the nineteen-year-old boy. He suspected that a lot of academics lived inside their own heads so much they began to think it was Mount Sinai. He disliked the way they seemed to him to use literature as an insulation against life rather than an intensification of it.
    He liked books but they were to him a kind of psychic food that should convert to energy for living. With academics the nature of their discipline seemed to preclude that. To take it that seriously would have annihilated the limits of aesthetics. Listening to their exchange of attitudes in what amounted to a private code, he didn’t regret the youthful impulse which had pushed him out into the streets and now brought him back here, by a circuitous and painful route, as an alien visitor. He didn’t want to be included in that clique of mutually supportive opinions that so often passes for culture.
    He remembered what had finally crystallised his rejection of university. It had been having to read and listen to the vague nonsense of academics commenting on the vague nonsense of much of what D. H. Lawrence wrote. Coming himself from a background not dissimilar to Lawrence’s, he thought he saw fairly clearly how Lawrence had put out his eyes with visions rather than grapple with reality that was staring him in the face. You needn’t blame him for hiding but you needn’t spend volumes trying to justify it either; unless, of course, it helped to make your own hiding easier to take.
    ‘A lot of what passes for intellectuality’s just polysyllabic prejudice,’ Laidlaw thought aloud.”

my first parkrun [19:56,3/87,78.8%]

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , on January 19, 2020 by xi'an

This morning, I had my first parkrun race in Gainesville, before heading back to Paris. (Thanks to Florence Forbes who pointed out this initiative to me.) Which reminded me of the race I ran in Helsinki a few years ago. Without the “self-transcendance” topping…! While the route was very urban, it was a fun opportunity to run a race with a few other runners. My time of 19.56 is not my best by far but, excuses, excuses, I was not feeling too well and the temperature was quite high (21⁰) and I finished in the first three runners, just seconds behind two young fellows who looked like they were still in high school.  (I am now holding the record of that race for my age group as well!) Anyway, this is a great way to join races when travelling and not worry about registration, certificates, &tc.

Parkrun also provides an age-grade adjusted ranking (78.8%), which is interesting but statistically puzzling as this is the ratio of one’s time over the fastest time (ever?) in the age x gender category. Given that fastest times are extreme, this depends on one individual and hence has a high variability. Especially in higher (meaning older!) veteran categories. A quantile in the empirical distribution would sound better. I came across this somewhat statistical analysis of the grade,

Hastings at 50, from a Metropolis

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , , , , , , , , , , , , , , on January 4, 2020 by xi'an

A weekend trip to the quaint seaside city of Le Touquet Paris-Plage, facing the city of Hastings on the other side of the Channel, 50 miles away (and invisible on the pictures!), during and after a storm that made for a fantastic watch from our beach-side rental, if less for running! The town is far from being a metropolis, actually, but it got its added surname “Paris-Plage” from British investors who wanted to attract their countrymen in the late 1800s. The writers H.G. Wells and P.G. Wodehouse lived there for a while. (Another type of tourist, William the Conqueror, left for Hastings in 1066 from a wee farther south, near Saint-Valéry-sur-Somme.)

And the coincidental on-line publication in Biometrika of a 50 year anniversary paper, The Hastings algorithm at fifty by David Dunson and James Johndrow. More of a celebration than a comprehensive review, with focus on scalable MCMC, gradient based algorithms, Hamiltonian Monte Carlo, nonreversible Markov chains, and interesting forays into approximate Bayes. Which makes for a great read for graduate students and seasoned researchers alike!