## methods for quantifying conflict casualties in Syria

Posted in Books, Statistics, University life with tags , , , , , , , , , , on November 3, 2014 by xi'an

On Monday November 17, 11am, Amphi 10, Université Paris-Dauphine,  Rebecca Steorts from CMU will give a talk at the GT Statistique et imagerie seminar:

Information about social entities is often spread across multiple large databases, each degraded by noise, and without unique identifiers shared across databases.Entity resolution—reconstructing the actual entities and their attributes—is essential to using big data and is challenging not only for inference but also for computation.

In this talk, I motivate entity resolution by the current conflict in Syria. It has been tremendously well documented, however, we still do not know how many people have been killed from conflict-related violence. We describe a novel approach towards estimating death counts in Syria and challenges that are unique to this database. We first introduce computational speed-ups to avoid all-to-all record comparisons based upon locality-sensitive hashing from the computer science literature. We then introduce a novel approach to entity resolution by discovering a bipartite graph, which links manifest records to a common set of latent entities. Our model quantifies the uncertainty in the inference and propagates this uncertainty into subsequent analyses. Finally, we speak to the success and challenges of solving a problem that is at the forefront of national headlines and news.

This is joint work with Rob Hall (Etsy), Steve Fienberg (CMU), and Anshu Shrivastava (Cornell University).

[Note that Rebecca will visit the maths department in Paris-Dauphine for two weeks and give a short course in our data science Master on data confidentiality, privacy and statistical disclosure (syllabus).]

## a weird beamer feature…

Posted in Books, Kids, Linux, R, Statistics, University life with tags , , , , , , , , , , , , on September 24, 2014 by xi'an

As I was preparing my slides for my third year undergraduate stat course, I got a weird error that got a search on the Web to unravel:

! Extra }, or forgotten \endgroup.
\endframe ->\egroup
\begingroup \def \@currenvir {frame}
l.23 \end{frame}
\begin{slide}
?


which was related with a fragile environment

\begin{frame}[fragile]
\frametitle{simulation in practice}
\begin{itemize}
\item For a given distribution $F$, call the corresponding
pseudo-random generator in an arbitrary computer language
\begin{verbatim}
> x=rnorm(10)
> x
[1] -0.021573 -1.134735  1.359812 -0.887579
[7] -0.749418  0.506298  0.835791  0.472144
\end{verbatim}
\item use the sample as a statistician would
\begin{verbatim}
> mean(x)
[1] 0.004892123
> var(x)
[1] 0.8034657
\end{verbatim}
to approximate quantities related with $F$
\end{itemize}
\end{frame}\begin{frame}


but not directly the verbatim part: the reason for the bug was that the \end{frame} command did not have a line by itself! Which is one rare occurrence where the carriage return has an impact in LaTeX, as far as I know… (The same bug appears when there is an indentation at the beginning of the line. Weird!) [Another annoying feature is wordpress turning > into &gt; in the sourcecode environment…]

## Đôi nét về GS. Xi’an

Posted in Books, Travel, University life with tags , , , , on May 28, 2013 by xi'an

Here is a short bio of me written in Vietnamese in conjunction with the course I will give at CMS (Centre for Mathematical Sciences), Ho Chi Min City, next week:

Christian P. Robert là giáo sư tại Khoa Toán ứng dụng của ĐH Paris Dauphine từ năm 2000. GS Robert đã từng giảng dạy ở các ĐH Perdue, Cornell (Mỹ) và ĐH Canterbury (New-Zealand). Ông đã làm biên tập cho tạp chí Journal of the Royal Statistical Society Series B từ năm 2006 đến năm 2009 và là phó biên tập cho tạp chí Annals of Statistics. Năm 2008, ông làm Chủ tịch của Hiệp hội Thống kê Quốc tế về Thống kê Bayes (ISBA). Lĩnh vực nghiên cứu của GS Robert bao gồm Thống kê Bayes mà tập trung chính vào Lý thuyết quyết định (Decision theory) và Mô hình lựa chọn (Model selection), Lý thuyết về Xích Markov trong mô phỏng và Thống kê tính toán.

## R midterms

Posted in Kids, Linux, R, Statistics, University life with tags , , , , , , , , , , , on November 9, 2012 by xi'an

Here are my R midterm exams, version A and version B in English (as students are sitting next to one another in the computer rooms), on simulation methods for my undergrad exploratory statistics course. Nothing particularly exciting or innovative! Dedicated ‘Og‘s readers may spot a few Le Monde puzzles in the lot…

Two rather entertaining if mundane occurences related to this R exam: one hour prior to the exam, a student came to my office to beg for being allowed to take the solution manual with her (as those midterm exercises are actually picked from an exercise booklet, some students cooperated towards producing a complete solution manual and this within a week!), kind of missing the main point of having an exam. (I have not seen yet this manual but I’d be quite interested in checking the code they produced on that occasion…) During the exam, another student asked me what was the R command to turn any density into a random generator: he had written a density function called mydens and could not fathom why rmydens(n) was not working. The same student later called me as his computer was “stuck”: he was not aware that a “+” prompt on the command line meant R was waiting for him to complete the command… A less comical event that ended well is that a student failed to save her R code (periodically and) at the end of the exam and we had to dig very deep into the machine to salvage her R commands from \tmp as rkward safeguards, as only the .RData file was available at first. I am glad we found this before turning the machine off, otherwise it would have been lost.

## Introducing Monte Carlo in PaRis [more slides]

Posted in R, Statistics, University life with tags , , , , , on November 18, 2010 by xi'an

The class started yesterday with a small but focussed and responsive audience! Given the background of the students, and in particular their clear proficiency in R!, I switched between the original slides of Introducing Monte Carlo Methods with R and those of my Monte Carlo Statistical Methods: course, updated by Olivier Cappé who is teaching the course in Paris-Dauphine this year.

## MCMC & ABC

Posted in Statistics, Travel, University life with tags , , , , , , on October 24, 2010 by xi'an

Here are my (preliminary) slides for the Wharton short course, in an evolutionary (!) version that will keep changing along the week as I incorporate the material from a survey on ABC we are currently writing with Jean-Michel Marin and Robin Ryder.

## R tee-shirt

Posted in Books, R, University life with tags , , , , , on September 21, 2010 by xi'an

I gave my introduction to the R course in a crammed amphitheatre of about 200 students today. Had to wear my collectoR teeshirt from Revolution Analytics, even though it only made the kids pay attention for about 30 seconds… The other few “lines” that worked were using the Proctor & Gamble “car 54″ poster and calling bootstrap “Statistics for dummies”, but I have trouble every year in getting the students interested in the topic (simulation) until…I introduced a (dummy) finance example of computing option prices. Sad!