Archive for academic journals

your interesting published article “An introduction to the special issue “

Posted in Books, University life with tags , , , , , on April 1, 2019 by xi'an

In the flow of unsolicited emails interested in publishing my work, a contender for the top call is this one as of today from Computer Communication & Collaboration that cites my foreword to the special issue of Statistics & Computing published out of the talks at MCMski IV in Chamonix. In 2014. (According to the above site, the publisher of the journal, Better Advances Press, does not meet most of its criteria and identified as predatory by Beall’s List, as of January 3, 2017.)

Your interesting published article “An introduction to the special issue “Joint IMS-ISBA meeting – MCMSki 4″” drives me to call for new papers, on behalf of Computer Communication & Collaboration, which is an English quarterly journal in Canada.

This peer-reviewed journal focuses on smart internet and it welcomes papers on general theories of computer science, data communications, multimedia, social network, machine learning, data mining, intelligent collaboration and other relevant topics, both theoretical and empirical.

All papers should be written in professional English. The length of 2000-6000 words is suggested. We accept papers in MS-word or PDF format.

If your paper is qualified for publication after refereeing, it will be published within 2-4 months from the date of submission.

Thank you for your consideration.

foundations of data science [editorial]

Posted in Books, pictures, Statistics, University life with tags , , , , , on March 25, 2019 by xi'an

The American Institute of Mathematical Sciences, one of eight NSF-funded mathematical institutes, is supporting a new journal on data sciences called Foundations of Data Science, with editors in chief Ajay Jasra, Kody Law, and Vasileios Maroulas. Since I know them reasonably well (!). I have asked the editors for an editorial and they obliged by sending me the following. [Disclaimer: I have no direct involvement in this journal or with the AIMS. While I am supportive of this endeavour, and wish the editorial team the best in terms of submitted papers and scientific impact, I nonetheless remain philosophically in favour of PCI publishing policies. Note that the journal is free access for the three next years. After that, the fee increase is definitely steep, with subscriptions in the $500-$1000 range and open access levies around $800…]

Data Science has become a term fraught with hype, adulation, and sometimes skepticism by the scientific community. For many it is simply a collection of tools from statistics, mathematics, and computer science to enable continuing improvements in the interpretation of data, which has been a primary driver of science since at least as early as Kepler’s data-driven approach to defining the laws of planetary motion. To others, the phrase has meant the dawning of an entirely new subject, with the power to change our lives as we see it. This latter perspective has lead to governments and private companies investing heavily in data science in order to obtain a competitive edge over their rivals. In the popular media, the notion of artificial intelligence (AI) and machine learning is almost inseparable from data science. This is the ability of an algorithm to process data and possibly prior information, in order to learn about quantities of interest related to the data, and possibly make decisions based upon that knowledge. Despite these contrasting opinions, what is clear is that Data Science is indispensable to the scientific community, and it is our responsibility to provide a concrete and rigorous foundation.

The recent trend driving the enormous interest in this field is the explosion of available data, which has lead to what Jim Gray of Microsoft referred to as the emerging 4th paradigm of data-intensive science. Indeed Mark Cuban predicted at SXSW 2017 that the World’s first trillionaire will be a person who has exploited AI. The first 2 paradigms of scientific discovery are experiment and theory, and computational science is the 3rd. Computational science and engineering (CSE) is driven largely by numerically simulating complex systems. Recently uncertainty is becoming a requisite consideration in complex applications which have been classically treated deterministically. This has lead to an increasing interest in recent years in uncertainty quantification (UQ). Statistics, and in particular Bayesian inference, provides a principled and well-defined approach to the integration of data and UQ, and research into computationally-intensive inference has intensified on the verge of the 4th paradigm. In some domains such as the geosciences, this is referred to as data assimilation, because the data is assimilated into an existing model approximating the underlying physical system. This stems from the more classical field of inverse problems, which entails an often deterministic and static approach to inferring parameters in a physical model given observations, while data assimilation leverages UQ as well as ideas from signal processing and control theory to extend these considerations to the online context. The term Bayesian inverse problems has recently been coined to refer to UQ via Bayesian inference for inverse problems. In either case, one infers or learns unknown parameters given available information, both in the form of thousands of years of model development, and in the form of abundant noisily observed data. This intersection of the 3rd and 4th paradigms is fertile territory in the transition to data intensive discovery, particularly where massive data sets are being generated by scientifically well-understood systems, and where an enormous amount of effort has already been invested in generating reliable computer software for simulating these systems. Areas such as data centric science and engineering, and scientific machine learning, are beginning to see impact in their application to CSE, for example to biological science and engineering in bioinformatics, and more recently to materials science and engineering in materials informatics, i.e. prediction and design of process-structure-properties relationships, as well as to civil and aerospace engineering and manufacturing, in digital twinning and industry 4.0. A notable growth area is the integration of numerical, statistical, and functional analysis, and probability; another natural growth area is the reliable computer software for implementing the resulting methods and algorithms.

Pure 4th paradigm activity includes the wealth of interesting problems arising from the classical machine learning and AI communities, driven largely by data which is generated by or has become available because of the internet, as well as problems of the data-driven approach to model discovery, where one aims to derive a model directly from the data, either without relying on existing understanding or when there is no existing understanding. This very powerful mode of inference is be- coming increasingly realistic in the data age. Complex data problems require a disciplinary interplay to develop new theories and address interdisciplinary questions. For example, in recent years, there have been many new theoretical developments combining ideas from topology and geometry with statistical and machine learning methods, for data analysis, visualization, and dimensionality reduction. Applications range from classification and clustering in fields such as action recognition, handwriting analysis and natural language processing, and biology, to the analysis of complex systems, for example related to national defense and sensor networks. Specifically, techniques including persistent homology and manifold learning have helped to compress nonlinear point cloud data from a new geometrically faithful point of view. In the realm of signal analysis, classification and clustering based on geometric and topological features of the phase-space of the signal has been able to identify features that traditional methods may fail to detect. These new developments should not be seen as overshadowing more classical approaches to computationally-intensive Bayesian inference, optimization, and control, but need to be integrated into a holistic theory and consequent methodologies.

The future of data science, including the developing fields of data centric science and engineering, and the opportunities following from implementation of methods and algorithms on future supercomputers at exascale, relies crucially on synergistic interplay between computer science, mathematics, statistics, and science and engineering disciplines, as well as other subject areas such as manufacturing and industry 4.0. There is a wealth of opportunity to discover new scalable mathematical and statistical approaches, and expand current foundational establishments of data analysis. The humble objective of our new journal, Foundations of Data Science (FoDS), is to bridge a gap in the literature by providing a venue to bring together research which transcends disciplinary boundaries, in a common pursuit of methodology, theory, and applications in data science. As we have outlined with a few examples above, it is our belief that there is great potential in particular at intersections between diㄦent existing disciplines and sub-disciplines, and these areas of intersection, which may often be overlooked as impure by other mainstream journals, will be embraced by this journal. That is, we advocate a proactive approach to contributing to the evolving definition(s) of Data Science (and AI).

We are grateful to our exceptional editorial board of data science all stars from across engineering, statistics, mathematics, and computer science. We are also grateful to AIMS director Shouchuan Hu for his support and for believing in our leadership potential, as well as the diligent staff at AIMS for everything they do to help streamline the process of running a journal. Most importantly, we are grateful to our authors. There is an enormous amount of work being done in this active and growing area. You have many very strong options for submitting your work, and more are emerging each year, so we very much appreciate your support of FoDS. Thank you for reading our welcome message, and joining us on this adventure! We conclude with a call to action:

Foundations of Data Science invites submissions focusing on advances in mathematical, statistical, and computational methods for data science. Results should significantly advance current understanding of data science, by algorithm development, analysis, and/or computational implementation which demonstrates behavior and applicability of the algorithm. Expository and review articles are welcome. Fields covered by the journal include, but are not limited to Bayesian statistics, high performance computing, inverse problems, data assimilation, machine learning, optimization, topological data analysis, spatial statistics, nonparametric statistics, uncertainty quantification, and data centric engineering. Papers which focus on applications in science and engineering are also encouraged, however the method(s) used should be applicable outside of one specific application domain.

Ajay Jasra (Editor-in- Chief)
Kody J. H. Law (Editor-in- Chief)
Vasileios Maroulas (Editor-in- Chief)

undecidable learnability

Posted in Books, Statistics, Travel, University life with tags , , , , , , on February 15, 2019 by xi'an

“There is an unknown probability distribution P over some finite subset of the interval [0,1]. We get to see m i.i.d. samples from P for m of our choice. We then need to find a finite subset of [0,1] whose P-measure is at least 2/3. The theorem says that the standard axioms of mathematics cannot be used to prove that we can solve this problem, nor can they be used to prove that we cannot solve this problem.”

In the first issue of the (controversial) nature machine intelligence journal, Ben-David et al. wrote a paper they present a s the machine learning equivalent to Gödel’s incompleteness theorem. The result is somewhat surprising from my layman perspective and it seems to only relate to a formal representation of statistical problems. Formal as in the Vapnik-Chervonenkis (PAC) theory. It sounds like, given a finite learning dataset, there are always features that cannot be learned if the size of the population grows to infinity, but this is hardly exciting…

The above quote actually makes me think of the Robbins-Wasserman counter-example for censored data and Bayesian tail prediction, but I am unsure the connection is anything more than sheer fantasy..!
~

rage against the [Nature] Machine [Intelligence]

Posted in Books, Statistics, University life with tags , , , , , , , , , on May 15, 2018 by xi'an

Yesterday evening, my friend and colleague Pierre Alquier (CREST-ENSAE) got interviewed (for a few seconds on-line!, around minute 06) by the French national radio, France Culture, about the recent call to boycott the incoming Nature Machine Intelligence electronic journal. Call to the machine learning community, based on the lack of paying journals among the major machine learnings journals, like JMLR. Meaning that related conferences like AISTATS and NIPS also get their accepted papers available on-line for free. As noted in the call

“Machine learning has been at the forefront of the movement for free and open access to research. For example, in 2001 the Editorial Board of the Machine Learning Journal resigned en masse to form a new zero-cost open access journal, the Journal of Machine Learning Research (JMLR).”

and here we go!

Posted in Books, Running, Statistics, University life with tags , , , , , on March 16, 2018 by xi'an

On March 1, I have started handling papers for Biometrika as deputy editor, along with Omiros Papaspiliopoulos. With on average one paper a day to handle this means a change in my schedule and presumably less blog posts about recent papers and arXivals if I want to keep my daily morning runs!

Biometrika

Posted in Books, Statistics, University life with tags , , , , , , , on November 29, 2017 by xi'an

After ten years of outstanding dedication to Biometrika, Anthony Davison is retiring as Editor of Biometrika on 31 December. Ten years! Running a top journal like Biometrika is a massive service to the statistics community, especially when considering the painstaking stage of literally editing each paper towards the stylistic requirements of the journal. For which we definitely should all be quite grateful to Anthony. And to the new Editor, Paul Fearnhead, for taking over. I will actually join the editorial board as assistant editor, along with Omiros Papaspiliopoulos, meaning we will share together the task of screening and allocating submissions. A bit daunting given the volume of submissions is roughly similar to the one I was handling for Series B ten years ago. And given the PCI Comput Stat experiment starting soon!

stop the rot!

Posted in Statistics with tags , , , , , , , , , , , , on September 26, 2017 by xi'an

Several entries in Nature this week about predatory journals. Both from Ottawa Hospital Research Institute. One emanates from the publication officer at the Institute, whose role is “dedicated to educating researchers and guiding them in their journal submission”. And telling the tale of a senior scientist finding out a paper submitted to a predatory journal and later rescinded was nonetheless published by the said journal. Which reminded me of a similar misadventure that occurred to me a few years ago. After having a discussion of an earlier paper therein rejected from The American Statistician, my PhD student Kaniav Kamary and I resubmitted it to the Journal of Applied & Computational Mathematics, from which I had received an email a few weeks earlier asking me in flowery terms for a paper. When the paper got accepted as such two days after submission, I got alarmed and realised this was a predatory journal, which title played with the quasi homonymous Journal of Computational and Applied Mathematics (Elsevier) and International Journal of Applied and Computational Mathematics (Springer). Just like the authors in the above story, we wrote back to the editors, telling them we were rescinding our submission, but never got back any reply or request of copyright transfer. Instead, requests for (diminishing) payments were regularly sent to us, for almost a year, until they ceased. In the meanwhile, the paper had been posted on the “journal” website and no further email of ours, including some from our University legal officer, induced a reply or action from the journal…

The second article in Nature is from a group of epidemiologists at the same institute, producing statistics about biomedical publications in predatory journals (characterised as such by the defunct Beall blacklist). And being much more vehement about the danger represented by these journals, which “articles we examined were atrocious in terms of reporting”, and authors submitting to them, as unethical for wasting human and animal observations. The authors of this article identify thirteen characteristics for spotting predatory journals, the first one being “low article-processing fees”, our own misadventure being the opposite. And they ask for higher control and auditing from the funding institutions over their researchers… Besides adding an extra-layer to the bureaucracy, I fear this is rather naïve, as if the boundary between predatory and non-predatory journals was crystal clear, rather than a murky continuum. And putting the blame solely on the researchers rather than sharing it with institutions always eager to push their bibliometrics towards more automation of the assessment of their researchers.