In fact, I have other difficulties than setting the cut-off point with the original scheme as a way to assess MCMC convergence or lack thereof, among which

- its dependence on the parameterisation of the chain and on the estimation of a specific target function
- its dependence on the starting distribution which makes the time to convergence not absolutely meaningful
- the confusion between getting to stationarity and exploring the whole target
- its missing the option to resort to subsampling schemes to attain pseudo-independence or scale time to convergence (albeit see 3. above)
- a potential bias brought by the stopping rule.

*““This is absolutely the stupidest thing ever,” said Antar Davis, 23, a former zookeeper who showed up in the elephant house on Friday to take one last look at Maharani, a 9,100-pound Asian elephant, before the zoo closed.” *The New York Times, Dec 29, 2018

*“The Trump administration has stopped cooperating with UN investigators over potential human rights violations occurring inside America [and] ceased to respond to official complaints from UN special rapporteurs, the network of independent experts who act as global watchdogs on fundamental issues such as poverty, migration, freedom of expression and justice.” *The Guardian, Jan 4, 2019

* “I know more about drones than anybody,” he said (…) Mr. Trump took the low number [of a 16% approval in Europe] as a measure of how well he is doing in the United States. “If I were popular in Europe, I wouldn’t be doing my job.””* The New York Times, Jan 3, 2019

*““Any deaths of children or others at the border are strictly the fault of the Democrats and their pathetic immigration policies that allow people to make the long trek thinking they can enter our country illegally.” *The New York Times, Dec 30, 2018

]]>

**M**y last book of the year (2018), which I finished one hour before midnight, on 31 December! Ka is a book about a crow, or rather, a Crow, Dar Oakley (or, in full, *Dar of the Oak by the Lea*), told from his viewpoint, and spanning all of Anthropocene, for Dar Oakley is immortal [sort of] and able to communicate with humans (and other birds, like Ravens. And coyotes). This summary of the plot may sound of limited appeal, but this may be the best book I read this past year. The Washington Post offers a critical entry into Ka that is much better than anything I can state about it. Not only it is about Crows and Ravens, fascinating social birds with a highly developed vocabulary that reflects the hierarchies in these avian societies. But it also offers another view on the doomed history of mankind, to which Crows seem irremediably linked and with whom Dar Oakley is sharing more that a territory. As so acutely perceived in another review from Locus, the beauty of the book and the genius of the writer, John Crowley, is to translate an alien intelligence in terms intelligible to the reader.

“A crow alone is no crow.”

A fairly, faery, unique, strangely moving, book, thus, that cannot suffer to be labelled into a category like fantasy or poetry or philosophical tale. Reflecting on the solitude brought by knowledge and communicating with another race. And of the bittersweet pain brought by immortality that makes Dar Oakley seek a former mate in the kingdom of dead Crows. An imperfect, fallible character, a perfect messenger of Death to accompany humanity on its last steps.

]]>*f(x)∝f¹(x)f²(x)…*

together, once simulation from each part has been done. In the same spirit as in Scott et al. (2016) consensus Monte Carlo. Where for instance the components of the target cannot be computed simultaneously, either because of the size of the dataset, or because of privacy issues.The idea in this paper is to target an augmented density with the above marginal, using for each component of f, an auxiliary variable x¹,x²,…, and a target that is the product of the squared component, f¹(x¹)², f²(x²)², … by a transition density keeping f¹(.)²,f²(.)²,… invariant:

as for instance the transition density of a Langevin diffusion. The marginal of

as a function of y is then the targeted original product. Simulating from this new extended target can be achieved by rejection sampling. (Any impact of the number of auxiliary variables on the convergence?) The practical implementation actually implies using the path-space rejection sampling methods in the Read Paper of Beskos et al. (2006). (An extreme case of the algorithm is actually an (exact) ABC version where the simulations x¹,x²,… from all components have to be identical and equal to y. The opposite extreme is the consensus Monte Carlo Algorithm, which explains why this algorithm is not an efficient solution.) An alternative is based on an Ornstein-Uhlenbeck bridge. While the paper remains at a theoretical level with toy examples, I heard from the same sources that applications to more realistic problems and implementation on parallel processors is under way.

]]>In the first chapter, about the history of languages, I found out, among other things, that ancient Jewish copists of the Bible had an error correcting algorithm consisting in giving each character a numerical equivalent, summing up each row, then all rows, and checking the sum at the end of the page was the original one. The second chapter explains why the early attempts at language computer processing, based on grammar rules, were unsuccessful and how a statistical approach had broken the blockade. Explained via Markov chains in the following chapter. Along with the Good-Turing [Bayesian] estimate of the transition probabilities. Next comes a short and low-tech chapter on word segmentation. And then an introduction to hidden Markov models. Mentioning the Baum-Welch algorithm as a special case of EM, which makes a return by Chapter 26. Plus a chapter on entropies and Kullback-Leibler divergence.

A first intermede is provided by a chapter dedicated to the late Frederick Jelinek, the author’s mentor (including what I find a rather unfortunate equivalent drawn between the Nazi and Communist eras in Czechoslovakia, p.64). Chapter that sounds a wee bit too much like an extended obituary.

The next section of chapters is about search engines, with a few pages on Boolean logic, dynamic programming, graph theory, Google’s PageRank and TF-IDF (term frequency/inverse document frequency). Unsurprisingly, given that the entries were originally written for Google’s blog, Google’s tools and concepts keep popping throughout the entire book.

Another intermede about Amit Singhal, the designer of Google’s internal search ranking system, Ascorer. With another unfortunate equivalent with the AK-47 Kalashnikov rifle as “elegantly simple”, “effective, reliable, uncomplicated, and easy to implement or operate” (p.105). Even though I do get the (reason for the) analogy, using an equivalent tool which purpose is not to kill other people would have been just decent…

Then chapters on measuring proximity between news articles by (vectors in a 64,000 dimension vocabulary space and) their angle, and singular value decomposition, and turning URLs as long integers into 16 bytes random numbers by the Mersenne Twister (why random, except for encryption?), missing both the square in von Neumann’s first PRNG (p.124) and the opportunity to link the probability of overlap with the birthday problem (p.129). Followed by another chapter on cryptography, always a favourite in maths vulgarisation books (but with no mention made of the originators of public key cryptography, like James Hellis or the RSA trio, or of the impact of quantum computers on the reliability of these methods). And by an a-mathematic chapter on spam detection.

Another sequence of chapters cover maximum entropy models (in a rather incomprehensible way, I think, see p.159), continued with an interesting argument how Shannon’s first theorem predicts that it should be faster to type Chinese characters than Roman characters. Followed by the Bloom filter, which operates as an approximate Poisson variate. Then Bayesian networks where the “probability of any node is computed by Bayes’ formula” [not really]. With a slightly more advanced discussion on providing the highest posterior probability network. And conditional random fields, where the conditioning is not clearly discussed (p.192). Next are chapters about Viterbi’s algorithm (and successful career) and the EM algorithm, nicknamed “God’s algorithm” in the book (Chapter 26) although I never heard of this nickname previously.

The final two chapters are on neural networks and Big Data, clearly written later than the rest of the book, with the predictable illustration of AlphaGo (but without technical details). The twenty page chapter on Big Data does not contain a larger amount of mathematics, with no equation apart from Chebyshev’s inequality, and a frequency estimate for a conditional probability. But I learned about 23&me running genetic tests at a loss to build a huge (if biased) genetic database. (The bias in “Big Data” issues is actually not covered by this chapter.)

*“One of my main objectives for writing the book is to introduce some mathematical knowledge related to the IT industry to people who do not work in the industry.”*

To conclude, I found the book a fairly interesting insight on the vision of his field and job experience by a senior scientist at Google, with loads of anecdotes and some historical backgrounds, but very Google-centric and what I felt like an excessive amount of name dropping and of I did, I solved, I &tc. The title is rather misleading in my opinion as the amount of maths is very limited and rarely sufficient to connect with the subject at hand. Although this is quite a relative concept, I did not spot beauty therein but rather technical advances and trick, allowing the author and Google to beat the competition.

]]>