Dr Xiao-Li Meng is Dean of the Graduate School of Arts and Sciences and Whipple V. N. Jones Professor of Statistics at University of Harvard.
His research interests include statistical theory and principles toward the foundation of Data Science; multi-resolution inferences, multi-phase Inferences, multi-source Inferences, philosophical and foundational issues in statistics, statistical computing and computational statistics, signal extractions and uncertainty assessments in natural, social, and medical sciences and elegant mathematical statistics.
Dr Meng obtained his B.S. in Mathematics and Diploma in Graduate Study of Mathematical Statistics from Fudan University, followed by an M.A and PhD both in Statistics from Harvard University. He first taught at the University of Chicago where he remains affiliated with its Center for Health Statistics) before returning to Harvard to teach in 2001.
Alison Oliver interviewed Dr Meng during the Royal Statistical Society’s annual conference towards the end of last year where he was one of the guest speakers.
1. You graduated from Harvard University where you’ve been teaching since 2001, including almost 10 as the Whipple V.N. Jones Professor of Statistics. When and how did you first become aware of statistics as a discipline and what was it that inspired you to pursue a career in the field?
I was trained in China, at Fudan University, as a pure mathematician. I was studying abstract algebra, amongst other topics. But then I took some courses on stochastic processes and probabilities, which were very fascinating to me. The idea that you can use precise mathematics to describe random phenomena was just intriguing, because before I started working in statistics, the idea of randomness was just something very chaotic. That’s a part of the reason why I started to study statistics more systematically. At that time in China, we were really studying what we today would call mathematical statistics, which after joining Harvard, I realized was very different from the statistics we learn here in the United States.
So I arrived in Harvard in 1986, with a full fellowship. I started to really learn the sort of statistical insight and applied side of statistics, which, as mentioned, was very different from what I learned back in China. In China, I mainly learnt about proving theorems but what I learned at Harvard was more along the lines of how to think about a data set, and how to think about what statistical methods and models are applicable. At the beginning of that process, the hardest part for me was to understand that, unlike in mathematics, almost no real-life statistical problem has the correct answer. It all depends on one’s assumptions and perspective. I absolutely hated those homework problems that were all words, no formulae, because I didn’t know how to translate the words into formulas that I knew how to manipulate. But eventually I learned the joy in that. That is, the intellectual challenge here is about turning a real-life problem into a mathematical formula, while still relevant, and solving it. Most importantly, how can I generalize whatever I was doing for a particular problem into a set of methodologies that I can then apply to other problems? And that’s how I essentially got into statistics, exactly 30 years ago.
2. Over the years, how has your teaching, consulting, and research motivated and influenced each other? Have you taken ideas from your research and incorporated them into your teaching?
Absolutely. During my time as a student, we did not spend that much time on teaching training, but now that I’m the Dean of Graduate School, teaching is so important to me. Part of the reason that I emphasize the teaching of training a lot now is that, over the years, I have realized the truth behind the saying ‘to teach is to learn twice.’ I recall that during my first job at the University of Chicago, I was given a book to teach sample surveys, about which I didn’t really learn that much myself when I was a student, but I guess the Harvard department has this reputation of knowing more about surveys and experimental design, so I was asked to teach sample surveys. But by teaching it, I really learned the material far better than otherwise, because it’s not about just understanding it, but about how to explain it – you need to explain to people who tend to have a variety of backgrounds. One needs to understand it very well in order to teach well. When I can explain something well, especially when I come up with multiple ways to explain it intuitively, then I would feel that I own the material – it is mine and I really learned it.
Probably an interesting coincidence is that during the talk I gave at RSS 2016, the key formula that I used, I actually learnt during my first teaching of that book thirty years ago. When I got interested in big data, I need to find a way to quantify the trade-off between the data quality with data quantity. Then I recalled that when I was teaching that survey course, I used a formula I learned there to derive an inequality, which led to a publication. When I re-read what I published then, the formula I was seeking for the trade-off just lifted off the page, to the point that I said out loud, “Wow, it’s there!” So this is just a direct example of how teaching and research integrate so well.
I have also done some consulting, working with a variety of scientists, epidemiologists, psychiatrists, engineers, and statistical geneticists. I’ve worked with a variety of different people for two very simple reasons: because statistics is applicable almost anywhere and because a part of my research is on missing data. Everybody faces this problem of not having enough data or that the data are not regular, so I got involved. I have learnt both the importance of communication, which is one of the points I emphasize a lot to my students, particularly as a dean. But I also have learnt how seemingly unrelated fields are all connected through the statistics—which makes me happier. I think I chose statistics at that time not because I knew that this was going to be such a hot field, but because I got attracted by this idea of mathematics describing random phenomena. I feel really very fortunate now that I made this choice. But a downside is that no matter how little sleep I get, I still don’t have enough time to do all of these interesting problems!
3. You were named Dean of the Graduate School of Arts and Sciences in 2012 and have remained at Harvard for many years after teaching for several years at the University of Chicago where you remain a Faculty Research Associate. What is that has kept you there? What do you love about Harvard?
Both places are terrific. I stayed at Chicago for ten years. It’s a very scholarly place. I’ve been at Harvard for sixteen years now. I enjoyed the environment in Chicago, where my colleagues got together every day for lunch; we talked about statistics, gossiped about the field. I have colleagues there that I admire – Peter McCullagh, Steve Stigler, Mike Stein, just to name a few. They think things through very deeply, and so whatever I write about, I still have this sort of habit that I need to check with them. Because when I was a junior faculty member there, whenever I was writing articles, I was not thinking about the journal audience. I was thinking about if it would pass muster by my colleagues. I’m trying in some ways to mimic that environment here at Harvard.
I have also worked quite a bit with psychiatrists and psychologists, which is one of the reasons that I’m still a part of the Center for Health Statistics at Chicago. Incidentally, I also recently went to the Argonne Fermi Lab in Chicago, because there’s a physicist there who wanted me to sit on one of their panels about the uncertainties dealing with their physics data.
As a former student of Harvard, I feel loyal to it. It’s an absolutely terrific place in terms of attracting people. Not just student but visitors and speakers from all over the world. And of course given my current role there, I have a lot to do.
…the hardest part for me was to understand that, unlike in mathematics, almost no real-life statistical problem has the correct answer. It all depends on one’s assumptions and perspective. I absolutely hated those homework problems that were all words, no formulae, because I didn’t know how to translate the words into formulas that I knew how to manipulate. But eventually I learned the joy in that. That is, the intellectual challenge here is about turning a real-life problem into a mathematical formula, while still relevant, and solving it. Most importantly, how can I generalize whatever I was doing for a particular problem into a set of methodologies that I can then apply to other problems?
4. Your research has focused on many aspects of statistical inference, multi-resolution modelling and Bayesian inference. What are you working on currently?
Quite a few people worry about the future of statistics in that we may be substituted by data science. For me, these thirty years have taught me that we statisticians have a long tradition of thinking things through at a very foundational level. We try to solve problems, but we also try to think about why that’s even possible.
Since we have this long tradition, we have principles developed that may seem very simple, but can actually generate lots of insight. Like everyone else, I’m trying to work on areas which currently are motivated by big data, and I’ve found that a little bit of deep statistical thinking can help in fundamental ways.
As an example, I am currently working on what I call the three kinds of “multiples”: multi-resolution, multi-source, and multi-phase. We tend to now try and make statistical inferences at different levels of the resolution, just as we zoom into a picture when we want to look at more refined levels, such as individuals in individualized medicine. When a doctor asks, “I want to know whether the treatment works for my patient,” it is a more refined question than the one addressed by a typical clinical trial: “Does the treatment work better than the placebo for the trial population?” That’s a different resolution. And how do we accumulate statistical evidences when our data just don’t have the required resolution?
For individualized treatments, we will never have the resolution we want, because we can never really test the drug on you before we give it to you. So we try to learn from another group of individuals who hopefully are similar to you. But how similar is similar? Each of us is unique, so the more similar we want the group to be, the smaller it would get, and hence we would increase the statistical uncertainty, and hence the answer is not robust. If we use a larger group, our answer would be more stable, but then it may not be that relevant to you since the group is less similar to you. So there is this fundamental robust-relevance trade-off, which is at the core of the multi-resolution inference. But statisticians would recognize that this is just another version of the well-known and well-studied variance-bias trade-off in statistics.
To explain the multi-source – any time we solve a big problem, the data are not just one single data set; it’s coming from all over the place. The example I used often is one from the US Census Bureau, who wanted to build a database about where people work, live, and how they commute. The data used include national household surveys and the unemployment insurance data, amongst others. There’s a variety of different qualities and quantities. Some only cover 0.05% of the population, others cover 90%. So I was asked by the Census Bureau: ‘How do we think about the two data sets, one is, say, a 0.5%, good quality data, like a random sample; one is some data somebody recorded, like online data, which covers 80% of the population? Which one should we trust more?’ I posed that question to several audiences: which one do you trust? Most people who are not trained in statistics chose the one that covers 80% of the population, and those who have been trained in statistics tend to think quality is more important. But there is a tradeoff. Using the formula I mentioned before, I did a simple calculation to show that a little bit of deterioration on quality can destroy much of the gain in quantity. I can easily make the results from 160 million people, representing half of the US population, to be statistically equivalent to results from 400 people from a well-controlled simple random sample. So it’s a very striking comparison. That’s a part of the research I am working on regarding multi-source inference.
The last one I call plainly multi-phase because the data we analyze typically are not the data we collected ourselves but from somebody else. When the Census Bureau provides data, they may collect the data themselves or they may hire some survey organizations, and then someone needs to carry out imputations to fill the missing data. So there’s a variety of processes involved, including pre-processing. When data have been pre-processed by multiple teams, one typically doesn’t know in the end what assumptions were made or what cleaning processes were used, but whatever things previous people have done would have an impact. But the traditional statistical analysis is almost always about setting up a model, trying to mimic the nature, and then say ok, here is the answer. But now the real complication is that by the time we see the data, there are several “second natures” already in them, so to speak. These are the type of problems that frankly have not been taken seriously by statisticians. But many others, including some computer scientists and data miners have focused on producing results, without really worrying about what these results would tell us. Are we making inferences about the real nature, or about “the second natures” created about the pre-processors, or a bit of both? This is an area where currently many things tend to be swept under the rug, but it is an area where we statisticians can make some fundamental contributions. So that’s the multi-phase problem on which I’m working.
At the moment, I am also teaching a workshop course on astrostatistics. I work with a group of astronomers and astrophysicists, and we have worked together for more than ten years. My job is to bring students from the statistical side; their job is to bring problems from the field. Then we work together. They educate my students (and me) about the physics and the astrophysics background, and my job is to help them think more statistically and offer guidance, as well as advising my students.
5. You have authored and edited many publications (Applied Bayesian Modeling and Casual Inference from Incomplete Data Perspectives, edited by Andrew Gelman and Xiao-Li Meng, July 2004, Wiley’ for example). What are the articles or books that you are most proud of?
I haven’t really published a book of my own. I’ve helped edit a few books, and among the books I’ve helped to edit, the one I actually like most is Strength in Numbers: The Rising of Academic Statistics Departments in the U.S. with Alan Agresti. I wish that more people paid attention to the history of how statistics departments got established. That was really mostly Alan’s work, because I was way too busy; Alan was very kind to take on most of the work. But it was a fascinating experience for me trying to learn about all these departments. Certainly it was a lot harder than we thought. History is made by people and written by people, but they’re different groups of people. So often there would be different views of about which part of history should be presented and represented; and that created quite a bit of politics among some of the departments.
In terms on my research side, articles of which I’m most proud are these discussion papers. These papers usually take a longer time to write, and in total, I think I have written about ten. The discussion papers tend to be the ones that you need to write with a big message, because people are going to discuss it. So I sort of forced myself to do that, because I like to encourage myself to think deeply, more broadly, and I need to have something intriguing to say, not just “here’s the message” but instead “what’s the implication of this message?” and “Why should anyone care about it?”
I was also asked by the Committee of Presidents of Statistical Societies (COPSS) to contribute to a volume called the Past, the Present, the Future of Statistics, written by former winners of the award. The idea was to give the future students and statisticians a sense of what we have been and will be doing, so about 50 people wrote all different kinds of topics. What I decided to write, and I really spent a lot of time on it, was a paper that lists important unsolved problems, and I formulated these problems using the three themes I just explained: multi-resolution, multi-source, and multi-phase. For each one of them, I listed three major problems. I then provided the background and how I had thought about them. So I enjoyed writing that article, because it’s not a usual one. It’s more about why I was motivated, what the problem I wanted to solve, and I provided the initial framework and listed many open questions. I was particularly pleased when I attended the International Indian Statistical Association Conference last year and one of the attendees told me that he used my article for a course, and the reaction he got from his students was, “Oh very few people write this type of article.” I don’t know whether that’s good or bad, but for me that’s good, because that was what I intended. This article is not about “here’s one more method for you to learn;” it’s to help our students to think.
And of course the article that I’m most proud of must be the next one.
6. Your lecture at RSS 2016 was entitled ‘Statistical Paradises and Paradoxes in Big Data.’ If there is one thing you wanted the audience to take away from the lecture, what would it be?
If there’s one message for people to take home it is to tell people that data quality is more important than data quantity. I don’t like the notion of ‘Big Data’ because “big” refers to the quantity, not the quality. Of course if we have a big high-quality data, then we will be in paradise. But unfortunately, the word “big” actually often comes with low quality. For example, internet data, of which we have tons. But we have little control of their quality. I found a way of illustrating it: as I mentioned, how 160 million, seemingly only slightly low quality, answers are statistically equivalent to 400 high quality answers. So that is my take-home message.
7. What do you see as the greatest challenges facing the profession of statistics in the coming years?
I think by now most people would agree that the greatest challenge is to maintain and enhance statisticians’ roles in data science. Data science will stay and will be a very strong field, attracting a lot of people, which is a good thing. But inside data science, there are two major players – computer scientists and statisticians. So the question then is what statisticians can contribute not to only sustain our position, but really help to move the data science forward. I think computer science inevitably emphasizes a lot of things being fast; if an algorithm is too slow, it can’t solve real-life problems. So computer scientists traditionally have worried more about computational efficiency, and statisticians, naturally, more about statistical efficiency, trying to get most of the information out of our data. These are all appropriate, but data science should be able to get the best of both worlds.
You know how I often use this analogy, which the computer scientists may not like: it’s like fast food versus gourmet cooking. There are things that have to be slow cooked, but hopefully tastes better. But the world needs fast food too. Ideally, we want to combine these two. So it tastes great, is easy to produce, and is nutritious.
…inside data science, there are two major players – computer scientists and statisticians. So the question then is what statisticians can contribute not to only sustain our position, but really help to move the data science forward. I think computer science inevitably emphasizes a lot of things being fast; if an algorithm is too slow, it can’t solve real-life problems. So computer scientists traditionally have worried more about computational efficiency, and statisticians, naturally, more about statistical efficiency, trying to get most of the information out of our data. These are all appropriate, but data science should be able to get the best of both worlds.
8. You have received recognition for your statistical contributions including the COPSS (Committee of Presidents of Statistical Societies) Award for ‘outstanding statistician under the age of forty’ in 2001. Is this the award you’re most proud of?
The COPSS award is certainly a great honour, and I am sure it would be on the proud list of any of its recipients. I am also very proud of a teaching award at the University of Chicago, because I was nominated by the students. So it was the recognition from students that I’ve done something which impacted their lives. Whenever I travel, I give many talks on research, but I also give talks on pedagogy. To me, being a professor ultimately means that integration of researcher and teacher.
9. What is the best or most influential book on statistics that you have ever read?
That’s a good question. For me, certainly the volumes that have offered many deep thoughts are the five volumes of Fisher’s publications. I think very few people in the world can claim that they have read all of them, and I’m certainly not one of them. But I have surprised myself the number of times I have needed to find an answer, I have found the answer or at least an inspiration there. I once co-taught a course on reading Fisher, but we only managed to touch the surface; we selected about ten out of so many papers. These are really great volumes, and I certainly recommend that they be continuously read.
10. Who are the people that have been influential in your career?
I would say there were really two groups – one at Harvard and other at Chicago. At Harvard, the first is my advisor, Donald Rubin. He’s the one who really taught me how to think intuitively, developing this sort of intuition without doing the algebra. He’s definitely the most influential person in my professional life in terms of developing the statistical me.
Andrew Gelman, who is my contemporary and my friend. Sometimes we call each other twin brothers as he and I were in the same year with the same advisor. He taught me a lot, too, from working with him. But that does not mean we always agree with each other. He actually just disagreed with something I wrote. But that kind of exchange is more helpful to me as a scholar.
On the University of Chicago side, one is Steve Stigler. He has had a great influence on me. He hired me when he was the department chair, and I benefit greatly to this day from his very broad historical perspective. He is a tremendous resource in terms of knowledge and sound professional advice.
And the other person is George Tiao. He’s retired from the Chicago’s Business School now, but he was the one who, from my first day in Chicago as an assistant professor, immediately recruited me to be one of the three screening associate editors for the then new journal that he founded, Statistica Sinica. So he trusted me very early on and brought me into of the world of the editorial and professional society. I joined the International Chinese Statistics Association the he also founded, and I served on a committee there, so he’s very responsible for my professional development, and my being an engaged citizen of the statistical profession.
11. If you had not got involved in the field of statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)
I have a perfect answer for you, but probably very few people could guess what that would be. People always tell me that I have a dream job, and they are right, I do, but I have to confess that if it was up to me to make a choice, I probably wouldn’t be a statistician. Not because I don’t love it, I absolutely love it. But I enjoy travel, food and wine, talking to people. I enjoy having an audience, giving talks, and writing. So one day I thought about what would be a perfect job that would have all these components? The answer is a food talk show host! I can then travel often, meeting all kinds of people, enjoy food and wine (for work!), write about them, and have a regular audience, so that’s my dream job. Speaking of food, I gather it’s time for us to have lunch…
Copyright: Image of Professor Meng copyright of Martha Stewart