Archive for Google

quantum simulation or supremacy?

Posted in Books, Statistics, University life with tags , , , , , on November 11, 2019 by xi'an

d41586-019-03167-2_17301142Nature this week contains a prominent paper by Arute et al. reporting an experiment on a quantum computer running a simulation on a state-space of dimension 253 (which is the number of qubits in their machine, plus one dedicated to error correction if I get it right). With a million simulation of the computer state requiring 200 seconds. Which they claim would take 10,000 years (3 10¹¹ seconds) to run on a classical super-computer. And which could be used towards producing certified random numbers, an impressive claim given the intrinsic issue of qubit errors. (This part is not developed in the paper but I wonder how a random generator could handle such errors.)

“…a “challenger” generates a random quantum circuit C (i.e., a random sequence of 1-qubit and nearest-neighbor 2-qubit gates, of depth perhaps 20, acting on a 2D grid of n = 50 to 60 qubits). The challenger then sends C to the quantum computer, and asks it apply C to the all-0 initial state, measure the result in the {0,1} basis, send back whatever n-bit string was observed, and repeat some thousands or millions of times. Finally, using its knowledge of C, the classical challenger applies a statistical test to check whether the outputs are consistent with the QC having done this.” The blog of Scott Aaronson

I have tried reading the Nature paper but had trouble grasping the formidable nature of the simulation they were discussing, as it seems to be the reproduction by a simulation of a large quantum circuit of depth 20, as helpfully explained in the above quote. And checking the (non-uniform) distribution of the random simulation is the one expected. Which is the hard part and requires a classical (super-)computer to determine the theoretical distribution. And the News & Views entry in the same issue of Nature. According to Wikipedia, “the best known algorithm for simulating an arbitrary random quantum circuit requires an amount of time that scales exponentially with the number of qubits“. However, IBM (a competitor of Google in the quantum computer race) counter-claims that the simulation of the circuit takes only 2.5 days on a classical computer with optimised coding. (And this should be old news by the time this blog post comes out, since even a US candidate for the presidency has warned about it!)

Nature worries

Posted in Statistics with tags , , , , , , , , , , , , on October 9, 2019 by xi'an

In the 29 August issue, worries about the collapse of the Italian government coalition for research (as the said government had pledge to return funding to 2009 [higher!] levels), Brexit as in every issue (and the mind of every EU researcher working in the UK), facial recognition technology that grows much faster than the legal protections which should come with it, thanks to big tech companies like Amazon and Google. In Western countries, not China… One acute point in the tribune being the lack of open source software to check for biases. More worries about Amazon, the real one!, with Bolsonaro turning his indifference if not active support of the widespread forest fires into a nationalist campaign. And cutting 80,000 science scholarships. Worries on the ethnic biases in genetic studies and the All of Us study‘s attempt to correct that (study run by a genetic company called Color, which purpose is to broaden the access to genetic counseling to under-represented populations). Further worries on fighting self-citations (with John Ioannidis involved in the analysis). With examples reaching a 94% rate for India’s most cited researcher.

O’Bayes 19: registration and travel support

Posted in pictures, Running, Travel, University life with tags , , , , , , , on February 14, 2019 by xi'an

An update about the O’Bayes 19 conference next June-July:  the early registration period has now opened. And there should be funds for supporting early-career researchers, thanks to Google and CNRS sponsorships, as detailed below:

Early-career researchers less than four years from PhD, are invited to apply for early-career scholarships. If you are a graduate student, postdoctoral researcher or early-career academic and you are giving a poster, you are eligible to apply. Female researchers and underrepresented minorities are especially encouraged to apply. Selected applicants will receive up to £450, which can be used for any combination of fees, travel and accommodation costs, subject to receipts.

The deadline for applying is the 15th of March (which is also the deadline to submit the abstract for the poster) and it has to be done at the registration phase via the dedicated page. Those who have submitted an abstract before this information on scholarships was made available (11 Feb.) and applying for travel support should contact the organisers.

the beauty of maths in computer science [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on January 17, 2019 by xi'an

CRC Press sent me this book for review in CHANCE: Written by Jun Wu, “staff research scientist in Google who invented Google’s Chinese, Japanese, and Korean Web search algorithms”, and translated from the Chinese, 数学之美, originating from Google blog entries. (Meaning most references are pre-2010.) A large part of the book is about word processing and web navigation, which is the author’s research specialty. And not so much about mathematics. (When rereading the first chapters to start this review I then realised why the part about language processing in AIQ sounded familiar: I had read it in the Beauty of Mathematics in Computer Science.)

In the first chapter, about the history of languages, I found out, among other things, that ancient Jewish copists of the Bible had an error correcting algorithm consisting in giving each character a numerical equivalent, summing up each row, then all rows, and  checking the sum at the end of the page was the original one. The second chapter explains why the early attempts at language computer processing, based on grammar rules, were unsuccessful and how a statistical approach had broken the blockade. Explained via Markov chains in the following chapter. Along with the Good-Turing [Bayesian] estimate of the transition probabilities. Next comes a short and low-tech chapter on word segmentation. And then an introduction to hidden Markov models. Mentioning the Baum-Welch algorithm as a special case of EM, which makes a return by Chapter 26. Plus a chapter on entropies and Kullback-Leibler divergence.

A first intermede is provided by a chapter dedicated to the late Frederick Jelinek, the author’s mentor (including what I find a rather unfortunate equivalent drawn between the Nazi and Communist eras in Czechoslovakia, p.64). Chapter that sounds a wee bit too much like an extended obituary.

The next section of chapters is about search engines, with a few pages on Boolean logic, dynamic programming, graph theory, Google’s PageRank and TF-IDF (term frequency/inverse document frequency). Unsurprisingly, given that the entries were originally written for Google’s blog, Google’s tools and concepts keep popping throughout the entire book.

Another intermede about Amit Singhal, the designer of Google’s internal search ranking system, Ascorer. With another unfortunate equivalent with the AK-47 Kalashnikov rifle as “elegantly simple”, “effective, reliable, uncomplicated, and easy to implement or operate” (p.105). Even though I do get the (reason for the) analogy, using an equivalent tool which purpose is not to kill other people would have been just decent…

Then chapters on measuring proximity between news articles by (vectors in a 64,000 dimension vocabulary space and) their angle, and singular value decomposition, and turning URLs as long integers into 16 bytes random numbers by the Mersenne Twister (why random, except for encryption?), missing both the square in von Neumann’s first PRNG (p.124) and the opportunity to link the probability of overlap with the birthday problem (p.129). Followed by another chapter on cryptography, always a favourite in maths vulgarisation books (but with no mention made of the originators of public key cryptography, like James Hellis or the RSA trio, or of the impact of quantum computers on the reliability of these methods). And by an a-mathematic chapter on spam detection.

Another sequence of chapters cover maximum entropy models (in a rather incomprehensible way, I think, see p.159), continued with an interesting argument how Shannon’s first theorem predicts that it should be faster to type Chinese characters than Roman characters. Followed by the Bloom filter, which operates as an approximate Poisson variate. Then Bayesian networks where the “probability of any node is computed by Bayes’ formula” [not really]. With a slightly more advanced discussion on providing the highest posterior probability network. And conditional random fields, where the conditioning is not clearly discussed (p.192). Next are chapters about Viterbi’s algorithm (and successful career) and the EM algorithm, nicknamed “God’s algorithm” in the book (Chapter 26) although I never heard of this nickname previously.

The final two chapters are on neural networks and Big Data, clearly written later than the rest of the book, with the predictable illustration of AlphaGo (but without technical details). The twenty page chapter on Big Data does not contain a larger amount of mathematics, with no equation apart from Chebyshev’s inequality, and a frequency estimate for a conditional probability. But I learned about 23&me running genetic tests at a loss to build a huge (if biased) genetic database. (The bias in “Big Data” issues is actually not covered by this chapter.)

“One of my main objectives for writing the book is to introduce some mathematical knowledge related to the IT industry to people who do not work in the industry.”

To conclude, I found the book a fairly interesting insight on the vision of his field and job experience by a senior scientist at Google, with loads of anecdotes and some historical backgrounds, but very Google-centric and what I felt like an excessive amount of name dropping and of I did, I solved, I &tc. The title is rather misleading in my opinion as the amount of maths is very limited and rarely sufficient to connect with the subject at hand. Although this is quite a relative concept, I did not spot beauty therein but rather technical advances and trick, allowing the author and Google to beat the competition.

Warwick summer school on computational statistics [last call]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on June 24, 2018 by xi'an

In case ‘Og’s readers are not aware, tomorrow, Monday 25 June, is the final day for registration in our incoming LMS/CRiSM summer school on computational statistics taking place at the University of Warwick, two weeks from now, 9-13 July, 2018. There is still room available till midday tomorrow. Greenwich Mean Time. And no later!

summer school on computational statistics [deadline]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on February 23, 2018 by xi'an

Reminding ‘Og’s readers and others that the early bird registration deadline for our LMS/CRiSM summer school on computational statistics at the University of Warwick, 9-13 July, 2018 is next Thursday, March 01, 2018. This also applies for bursary applications, so do not dally and apply now!

the definitive summer school poster

Posted in pictures, Statistics, University life with tags , , , , , , , , , , , , on January 8, 2018 by xi'an