“The rise of data science has forced us to consider our relationship with computer science:” An interview with Michael Jordan

Michael Jordan is Pehong Chen Distinguished Professor at the University of California-Berkeley where his time is split between the Department of Electrical Engineering and Computer Sciences and Department of Statistics. He was previously a professor at MIT for ten years and has established his name in the field of statistics with his outstanding work in machine learning and artificial intelligence.

His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics.

Professor Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. He is a Fellow of the American Association for the Advancement of Science. He has been named a Neyman Lecturer and a Medallion Lecturer by the Institute of Mathematical Statistics. He received the IJCAI Research Excellence Award in 2016, the David E. Rumelhart Prize in 2015 and the ACM/AAAI Allen Newell Award in 2009. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM.

Alison Oliver, Editor of Statistics Views, spoke to Professor Jordan at JSM 2016 about his career in statistics.

 

1. You graduated from Arizona State University with a degree in mathematics and then gained your PhD from the University of California, San Diego in cognitive science. When and how did you first become aware of statistics as a discipline and what was it that inspired you to gradually pursue a career in the field?

I think the first thing I read that set me on an intellectual career was the Autobiography of Bertrand Russell. It’s a three-volume set, beautifully written, outlining not only his approach to philosophy but also to life—a very compelling read. Here one began to perceive the emergence of the fields of cognitive science and cognitive neuroscience—where one asks questions about how the mind works as a computational entity running on biological hardware. I did eventually begin to work in cognitive science, and began to work on models of mental phenomena. These were statistical models, and given that they were models of the thought processes of an individual person, it was natural and even essential to include a subjective prior in the cognitive model. So early on, I became aware of the Bayesian approach to statistical inference. But cognitive science is also an experimental field, and I had to learn statistics so that I could analyze my data. Here I learned mostly frequentist statistics, and I was taught to seek objectivity in my data analyses, and good frequentist performance in the software that I would write; software that would be used on many different data sets. And thus I was confronted early on with good motivations for both Bayesian and frequentist inference, and I began to become intrigued as to the relationships between the two, and the underlying conceptual foundations of inference. Eventually my interests in statistics began to take over and I decided to make a career of it.

2. You spent 10 years teaching at MIT and you are affiliated with both the Department of Statistics and of EECS. What is it about Berkeley that you love that has kept you there?

From cognitive science I was also led to an interest in artificial intelligence, and thence a full-fledged interest in computer science. This was in addition to my growing interest in statistics. But the two supported each other. An artificially intelligent system must be able to make inferences and decisions in an uncertain world, and so statistics is essential. At Berkeley I was recruited to bridge between statistics and computer science. This was a perfect fit, and as time has gone on the fit has gotten better and better. Certainly the computer science department at Berkeley is the most statistical of all the leading computer science departments, and that’s something that I’m pleased about. I feel that I am contributing to a tradition, going back to Neyman and Blackwell, of statistics as a discipline that has both a core and strong connections to other disciplines.

3. Your research interests have focused on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, and applications to problems in distributed computing systems to name a few. What are you working on currently?

Everything I’ve ever worked on in the past is still being worked on in some part of my brain. But most of my neurons are currently devoted to the grand challenge of the conceptual blending of statistics and computation. I think that there are many untouched problems there. Statistics came of age before there were computers, and as computers have gone everywhere, not only have they provided a lot more data, they’ve provided constraints. There are runtime constraints, for example—you have to have certain things done in a certain amount of time. Things like milliseconds or microseconds for financial decisions, tens of milliseconds for interactions with humans and search, and then other time scales for other problems. Statistics never has, in its basic theory, conceived of temporal budgets. And it’s challenging from both a theoretical and a procedural point of view to develop a version of statistics which is sensitive to not only runtime but to other computational constraints. Similarly, in computer science, there’s been very little concern with decision-theoretic issues, with uncertainty and with inferential issues. The fundamental theory of computer science just doesn’t have a place for those things. So I think there are profound problems involved in bringing these things together, with huge practical consequences.

4. You gave two lectures during JSM 2016, one introductory and the other on Computational Thinking and Inferential Thinking. What did you wish your audience to take away from them?

The introductory lecture was precisely about the theoretical challenges that I’ve alluded to in bringing together inference and computation. Those are challenges at the research frontier and for the future. But in the meantime, one sees the great power inherent in bringing these intellectual traditions together when one considers designing introductory classes, at the undergraduate level, that teach computation and inference jointly. In the other lecture, given jointly with my colleague Ani Adhikari, we discussed an exciting course that we’ve started at Berkeley that teaches basic inferential ideas and basic computing ideas together in an interlaced way at the freshman level. The course leans heavily on resampling-based inference, which minimizes the mathematical overhead, while crying out for algorithmic thinking. So we teach Python incrementally, introducing just enough as we go to allow an inferential problem to be solved. This allows students to work with real data with procedures that they’ve implemented. Students love the course; it’s been a major success.

5. You have authored many publications. What is the article or book that you are most proud of?

That question was the one that I saw in your list for which my response was going to be, “Let’s not even go there,” but. . . Here’s maybe one way of saying it: I’m always proud of the very most recent thing I’ve done. So if you go to my publications page, which I just looked at this morning, and at the very top of my publications page is an article with my students Andre Wibisono and Ashia Wilson about a theory of accelerated methods and optimization. It’s about the optimal way to optimize, an intriguing concept. I think it’s a beautiful paper and hope people pay attention to it.

6. You have also been the recipient of numerous honours. Has there been a particular highlight?

There may be two that are worth mentioning. One of them is that I was the Neyman Lecturer for the IMS. Given that I went to Berkeley to drink of the tradition that was initiated by Neyman and sustained through all these years, it was a pleasure to give that lecture. Indeed, I think of myself as an applied statistician with a computational side to my work, and so I was pleased to be able to present my take on what applied statistics is and where it’s going.

The other one is the David E. Rumelhart Prize, which is given to one person per year in the field of cognitive science, which is the discipline I came from as I moved into statistics. David Rumelhart was my advisor and mentor when I was a cognitive scientist, and he was one of the people who most contributed to my intellectual development. Although he was a cognitive scientist, he was also, I think, a statistician at heart and very interested in statistical inference. He helped me develop a perspective on cognition from the point of view of statistical modelling. David unfortunately died at a young age. I miss him to this day, and I was greatly honoured to receive the award named after him.

8. What do you see as the greatest challenges facing the profession of statistics in the coming years?

The rise of data science has forced us to consider our relationship with computer science. It is challenging to cooperate and entangle ourselves with a different discipline. Computer science has its own energy, its own biases, prejudices, and history. We need to reconsider the two different histories of statistics and computer science and endeavour put them back together. In fact I think they’re two sides of the same coin: algorithms and inference are fundamentally allied with each other and fundamentally present in essentially any problem that has to do with real-world inference and decision-making. This is particularly true when data sets are so large that naïve approaches to methodology are simply not viable.

9. What has been the best book on statistics that you have ever read?

One that impacted me was actually two editions of a book by James Berger called Statistical Decision Theory. The first edition was frequentist. He then wrote a second edition some number of years later in which it had become a Bayesian book—much of the same material, but the point of view had shifted. Reading both versions of the book finally brought some clarity into my own thinking about statistical foundations.

10. Who are the people that have been influential in your career?

Being at Berkeley since 1998, I had the pleasure of being able to meet and interact with David Blackwell and Eric Lehmann. Conversations with them were fascinating and eye-opening and helped me orient myself as I was becoming a full-time statistician.

11. If you had not got involved in the field of engineering and statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)

As with most academics, a sequence of random events has led me into the specific field I’m in. If you re-run my life, I suspect I would have ended up in a different academic discipline by chance alone. But I’m lucky that I ended up in the one I’m in. I think it’s fascinating that it has both a mathematical and a practical side to it. It’s in major growth mode, and it is fun to be able to participate in that. Also I love music, and I play in bands as a hobby. With some effort, I can envision having chosen to become a professional musician, with statistics as a hobby, instead of the other way around.