Judea Pearl is currently Professor of Computer Science and Statistics and Director of the Cognitive Systems Laboratory at UCLA where he joined in 1970. He achieved a B.S. degree in Electrical Engineering from the Technion in 1960, a Master degree in Physics from Rutgers University in 1965 and a Ph.D. degree in Electrical Engineering from the Polytechnic Institute of Brooklyn in 1965.
Prior to working at UCLA, Pearl worked at RCA Research Laboratories, Princeton, New Jersey, on superconductive parametric and storage devices and at Electronic Memories, Inc., Hawthorne, California, on advanced memory systems. I joined UCLA in 1970.
He is best known for championing the probabilistic approach to artificial intelligence and the development of Bayesian networks. More recently, he has been honured for developing a theory of causal and counterfactual inference based on structural models, winning the ACM Turing Award in 2011, the highest distinction in computer science, “for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”.
StatisticsViews talked to Professor Pearl during the Joint Statistical Meetings 2013 in Montreal, Canada about his career, especially his research with regards to causality, Bayesian theory, teaching statistics, those who influenced him and his advice for up-and-coming statisticians.
1. With an educational background in electrical engineering and physics from Technion, Rutgers University and the Polytechnic Institute of Technology respectively, when and how did you first become aware of statistics as a discipline?
My first encounter with statistics (aside from a college course in engineering statistics) was in the early 1970s when I joined UCLA and began teaching decision theory. The books by Howard Raiffa, Tom Ferguson, Lenny Savage, and Ron Howard were the classics of the time, and they painted statistics as an integral component of man’s quest for knowledge.
I think they were basically right, in spite of statistics’ century-old boycott of causality.
2. You are known to be one of the pioneers of Bayesian networks and the probabilistic approach to artificial intelligence, and one of the first to mathematize causal modeling in the empirical sciences. Your research interests include the philosophy of science, knowledge representation, nonstandard logics, and learning. What are you focussing on currently and what do you hope to achieve through your research?
I am currently working on two frontiers—technical and philosophical. On the technical side, I am charting the field of statistics for application areas that can benefit from modern advances in causal inference. These advances are, for all intents and purposes, unknown to researchers and practitioners in those areas. We discovered, for instance, that meta-analysis and missing-data are two problem areas that have originally been formulated in statistical vocabulary, but can benefit substantially if cast as causal problems.
On the philosophical front, I believe that recent mathematization of counterfactuals puts us in a good position to unravel perplexing philosophical questions that have baffled generations of scientists—among these are problems of consciousness, free-will and agency. I hope to one day understand why evolution has equipped us with these illusions, and what computational benefits we draw from thinking they are real.
I believe that recent mathematization of counterfactuals puts us in a good position to unravel perplexing philosophical questions that have baffled generations of scientists—among these are problems of consciousness, free-will and agency. I hope to one day understand why evolution has equipped us with these illusions, and what computational benefits we draw from thinking they are real.
3. This year is not only the International Year of Statistics but also the 250th anniversary of Bayes theory. Have you been celebrating this anniversary in any way?
I have seen plans for such a celebration in Edinburgh, but since it is “by invitation only,” I’ve resigned myself to a passive observer status. On the other hand, I have included three slides in my Medallion Lecture to commemorate Bayes’ paper of 1763 and one slide to commemorate the 300-year anniversary of Jacob Bernoulli’s Ars Conjectandi (Basel, Switzerland, 1713). I present these as crown achievements of a methodological paradigm that I wish to communicate: “Think nature, not data; ask not what you can do to unveil reality, ask first what reality should be like for unveiling to succeed.” Both Bernoulli and Bayes illustrate the success of this deductive approach to statistical science.
4. You are currently Professor of Computer Science at the University of California at Los Angeles. Over the years, how has your teaching, consulting, and research motivated and influenced each other? Do you continue to get research ideas from statistics and incorporate your ideas into your teaching? Where do you get inspiration for your research projects and books?
The ideas keep bursting with increasing intensity, and the inspiration comes from two fountains. The first is the fountain of mathematical freedom—namely, the freedom of going from assumptions to conclusions without vouching for the veracity of the former. By refraining from making claims about reality and narrowing my questions to “what reality ought to be like,” I obtain an instant protection from ever being wrong, and in the safety of such protection, the imagination runs wild and fearless. Indeed I am often criticized for making daring assumptions, but rarely for being wrong.
The second fountain comes from our natural urge to be useful. When you look around and see powerful tools being developed in key areas, but the researchers who would benefit most from these tools know the least about them, it is hard to keep the lid on the ideas that beg to be heard.
5. You have been presented with numerous awards including the Turing Award “for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”. Which of your many accolades would you say you are most proud of?
The Turing Award is indeed the most prestigious and most gratifying, for it assures me that my colleagues in computer science appreciate the significance of my work, including its impact on areas outside of computer science. But when it comes to pride, I would name the 2012 Harvey Prize from my alma mater, the Technion – Israel Institute of Technology. It makes me extremely proud to be able to make a partial payback for the precious resources invested in my education, in times of austerity and by a nation in the making, and for the quality of education that my generation received from the best teachers one can imagine.
6. Using some of the proceeds from the Turing Award, you set up the Causality in Statistics Education Award. Why do you think it is so important to incorporate teaching of basic causal inference into introductory statistics courses?
It’s important because causal inference is, in many ways, antithetical to the traditional paradigm of statistics. For example, it requires students to consider reality first and data second; it requires one to articulate and work with untested assumptions, and do it in a language that deviates from standard probability theory. Once students take standard statistics classes and fall into the routine of thinking “data first,” it is much harder for them to acquire the intellectual skills needed for causal inference.
I say this because it took me ten years to make the transition from probabilistic to causal thinking, and I see how traumatic this transition can be to colleagues who were trained in standard statistical tradition. This explains, to a large extent, why so few instructors are prepared to address causal questions in statistics classes, and why we need prizes and community support to encourage the production of instructional material in this area.
…causal inference is, in many ways, antithetical to the traditional paradigm of statistics. For example, it requires students to consider reality first and data second; it requires one to articulate and work with untested assumptions, and do it in a language that deviates from standard probability theory.
7. How would you encourage young people to get into statistics at a young age? What words of advice would you give someone hoping to make it into a career in your field?
My words would be: “This is a once-in-a-lifetime chance you have to shape a field that is begging to be reshaped after a century of neglect. Don’t miss it.”
8. You have authored many publications. Is there a particular article or book that you are most proud of?
To be honest, I rarely go back to my old publications—there are so many new challenges to tackle. But if hard-pressed, I would choose my 1994 article on do-calculus (chapter 3 of my book, Causality); partly because of the thrill I felt when I first tried its power on nontrivial problems, partly because it led immediately to the structural definition of counterfactuals, and partly because of its longevity. I see more and more problem areas demystified under its light, from Simpson’s paradox to problems of mediation and external validity.
9. What has been the most exciting development that you have worked on in statistics during your career?
Speaking of excitement, I recall again the breath-taking excitement I felt when the “do-calculus” first came into being, and started delivering the correct results on every problem instance tried, including those deemed unsolvable by others (e.g., the front-door variety), and those whose solution we did not predict. It reminded me of the excitement I felt in high school with my first exposure to analytic geometry—where all the geometrical puzzles that we used to solve with hard labour could be submitted to algebraic manipulations, and, as if by a miracle, they always came back with the right answers. It was incredible. I thought Descartes was the greatest mathematician of all time, and I was ecstatic to see myself imitating his game, however clumsily.
10. Are there people or events that have been influential in your career?
As a kid, I was enormously influenced by books on the lives of great scientists: Archimedes, Bacon, Galileo, Newton, Faraday and Einstein. This influence was further deepened by my science teachers in Israel; they possessed the unique ability to make these legendary figures come to life and to give us the illusion that we were following their footsteps.
In Causality, I describe two specific ideas that had a major influence on my transition from probability to causal thinking. The first arose in the summer of 1990 while I was working with Tom Verma on “A Theory of Inferred Causation,” and we replaced conditional probabilities with structural equations. At that moment, everything began to fall into place; we finally had a mathematical object to which we could attribute familiar properties of physical mechanisms and causal relations instead of those slippery probabilities that make up Bayesian networks.
The second breakthrough came from Peter Spirtes’ lecture at the International Congress of Philosophy of Science (Uppsala, Sweden, 1991). In one of his slides, Peter illustrated how a causal diagram should change when a variable is manipulated. To me, this slide—when combined with structural equations—was the key to unfolding the manipulative account of causation, the mathematization of counterfactuals, and then on to most other developments I pursued in causal inference.
This answer will be incomplete without confessing the profound impact that Dennis Lindley had on my encounter with statistics. Lindley reviewed my book, Causality, in 2002. To my surprise, instead of joining the all-knowing sages with the mantras: “causality is not well defined” or “causality is well defined in statistics,” he actually took the time to examine the book from first principles, and concluded that: “Hmm, statistics has something to learn here.” My subsequent communication with Lindley—witnessing his intellect, curiosity, and integrity—essentially restored my faith in the capacity of statistics to join, if not to lead the age of causation.