A recent news editorial in Nature (15 March issue) reminded me of the lectures Louis Aslett gave at the Gregynog Statistical Conference last week, on the advanced use of cryptography tools to analyse sensitive and private data. Lectures that reminded me of a graduate course I took on cryptography and coding, in Paris 6, and which led me to visit a lab at the Université de Limoges during my conscripted year in the French Navy. With no research outcome. Now, the notion of using encrypted data towards statistical analysis is fascinating in that it may allow for efficient inference and personal data protection at the same time. As opposed to earlier solutions of anonymisation that introduced noise and data degradation, not always providing sufficient protection of privacy. Encryption that is also the notion at the basis of the Nature editorial. An issue completely missing from the paper, while stressed by Louis, is that this encryption (like Bitcoin) is costly, in order to deter hacking, and hence energy inefficient. Or limiting the amount of data that can be used in such studies, which would turn the idea into a stillborn notion.
Archive for Gregynog Statistical Conference
bitcoin and cryptography for statistical inference and AI
Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags AI, anonymised data, bitcoin, Britain, cryptography, encryption, Gregynog Hall, Gregynog Statistical Conference, information, Navy, Powys, Tregynon, Wales on April 16, 2018 by xi'anX entropy for optimisation
Posted in Books, pictures, Statistics, Travel, University life with tags cross-entropy method, Gregynog Statistical Conference, Monte Carlo Statistical Methods, Reuven Rubinstein, sheep, simulated annealing, stochastic optimisation, stochastic simulation, sudoku, travelling salesman, Tregynon, Wales on March 29, 2018 by xi'anAt Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.
During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…
1 2 8 5 9 7 4 9 3 7 3 5 1 2 4 6 2 8 4 6 9 6 3 8 5 7 1 2 7 5 3 1 6 9 4 8 8 1 4 7 8 9 7 6 2 6 9 3 8 4 2 1 3 5 3 8 6 4 7 5 2 1 9 1 4 2 9 6 3 8 5 7 9 5 7 2 1 8 3 4 6
It is hard to consider a distribution on the space of permutations, đâžÂč.