Archive for data privacy

Ocean’s four!

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on October 25, 2022 by xi'an

Fantastic news! The ERC-Synergy¹ proposal we submitted last year with Michael Jordan, Éric Moulines, and Gareth Roberts has been selected by the ERC (which explains for the trips to Brussels last month). Its acronym is OCEAN [hence the whale pictured by a murmuration of starlings!], which stands for On intelligenCE And Networks​: Mathematical and Algorithmic Foundations for Multi-Agent Decision-Making​. Here is the abstract, which will presumably turn public today along with the official announcement from the ERC:

Until recently, most of the major advances in machine learning and decision making have focused on a centralized paradigm in which data are aggregated at a central location to train models and/or decide on actions. This paradigm faces serious flaws in many real-world cases. In particular, centralized learning risks exposing user privacy, makes inefficient use of communication resources, creates data processing bottlenecks, and may lead to concentration of economic and political power. It thus appears most timely to develop the theory and practice of a new form of machine learning that targets heterogeneous, massively decentralized networks, involving self-interested agents who expect to receive value (or rewards, incentive) for their participation in data exchanges.

OCEAN will develop statistical and algorithmic foundations for systems involving multiple incentive-driven learning and decision-making agents, including uncertainty quantification at the agent’s level. OCEAN will study the interaction of learning with market constraints (scarcity, fairness), connecting adaptive microeconomics and market-aware machine learning.

OCEAN builds on a decade of joint advances in stochastic optimization, probabilistic machine learning, statistical inference, Bayesian assessment of uncertainty, computation, game theory, and information science, with PIs having complementary and internationally recognized skills in these domains. OCEAN will shed a new light on the value and handling data in a competitive, potentially antagonistic, multi-agent environment, and develop new theories and methods to address these pressing challenges. OCEAN requires a fundamental departure from standard approaches and leads to major scientific interdisciplinary endeavors that will transform statistical learning in the long term while opening up exciting and novel areas of research.

Since the ERC support in this grant mostly goes to PhD and postdoctoral positions, watch out for calls in the coming months or contact us at any time.

Continue reading

Fusion at CIRM

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on October 24, 2022 by xi'an

Today is the first day of the FUSION workshop Rémi Bardenet and myself organised. Due to schedule clashes, I will alas not be there, since [no alas!] at the BNP conference in Chili. The program and collection of participants is quite exciting and I hope more fusion will result from this meeting. Enjoy! (And beware of boars, cold water, and cliffs!!!)

distributed evidence

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , on December 16, 2021 by xi'an

Alexander Buchholz (who did his PhD at CREST with Nicolas Chopin), Daniel Ahfock, and my friend Sylvia Richardson published a great paper on the distributed computation of Bayesian evidence in Bayesian Analysis. The setting is one of distributed data from several sources with no communication between them, which relates to consensus Monte Carlo even though model choice has not been particularly studied from that perspective. The authors operate under the assumption of conditionally conjugate models, i.e., the existence of a data augmentation scheme into an exponential family so that conjugate priors can be used. For a division of the data into S blocks, the fundamental identity in the paper is

p(y) = \alpha^S \prod_{s=1}^S \tilde p(y_s) \int \prod_{s=1}^S \tilde p(\theta|y_s)\,\text d\theta

where α is the normalising constant of the sub-prior exp{log[p(θ)]/S} and the other terms are associated with this prior. Under the conditionally conjugate assumption, the integral can be approximated based on the latent variables. Most interestingly, the associated variance is directly connected with the variance of

p(z_{1:S}|y)\Big/\prod_{s=1}^S \tilde p(z_s|y_s)

under the joint:

“The variance of the ratio measures the quality of the product of the conditional sub-posterior as an importance sample proposal distribution.”

Assuming this variance is finite (which is likely). An approximate alternative is proposed, namely to replace the exact sub-posterior with a Normal distribution, as in consensus Monte Carlo, which should obviously require some consideration as to which parameterisation of the model produces the “most normal” (or the least abnormal!) posterior. And ensures a finite variance in the importance sampling approximation (as ensured by the strong bounds in Proposition 5). A problem shared by the bridgesampling package.

“…if the error that comes from MCMC sampling is relatively small and that the shard sizes are large enough so that the quality of the subposterior normal approximation is reasonable, our suggested approach will result in good approximations of the full data set marginal likelihood.”

The resulting approximation can also be handy in conjunction with reversible jump MCMC, in the sense that RJMCMC algorithms can be run in parallel on different chunks or shards of the entire dataset. Although the computing gain may be reduced by the need for separate approximations.

One World ABC seminar [season 2]

Posted in Books, Statistics, University life with tags , , , , , , on March 23, 2021 by xi'an

The One World ABC seminar will resume its talks on ABC methods with a talk on Thursday, 25 March, 12:30CET, by Mijung Park, from the Max Planck Institute for Intelligent Systems, on the exciting topic of producing differential privacy by ABC. (Talks will take place on a monthly basis.)

Big Bayes goes South

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on December 5, 2018 by xi'an

At the Big [Data] Bayes conference this week [which I found quite exciting despite a few last minute cancellations by speakers] there were a lot of clustering talks including the ones by Amy Herring (Duke), using a notion of centering that should soon appear on arXiv. By Peter Müller (UT, Austin) towards handling large datasets. Based on a predictive recursion that takes one value at a time, unsurprisingly similar to the update of Dirichlet process mixtures. (Inspired by a 1998 paper by Michael Newton and co-authors.) The recursion doubles in size at each observation, requiring culling of negligible components. Order matters? Links with Malsiner-Walli et al. (2017) mixtures of mixtures. Also talks by Antonio Lijoi and Igor Pruenster (Boconni Milano) on completely random measures that are used in creating clusters. And by Sylvia Frühwirth-Schnatter (WU Wien) on creating clusters for the Austrian labor market of the impact of company closure. And by Gregor Kastner (WU Wien) on multivariate factor stochastic models, with a video of a large covariance matrix evolving over time and catching economic crises. And by David Dunson (Duke) on distance clustering. Reflecting like myself on the definitely ill-defined nature of the [clustering] object. As the sample size increases, spurious clusters appear. (Which reminded me of a disagreement I had had with David McKay at an ICMS conference on mixtures twenty years ago.) Making me realise I missed the recent JASA paper by Miller and Dunson on that perspective.

Some further snapshots (with short comments visible by hovering on the picture) of a very high quality meeting [says one of the organisers!]. Following suggestions from several participants, it would be great to hold another meeting at CIRM in a near future. Continue reading

%d bloggers like this: