“I want to help scientists understand things better”: An interview with Robert Tibshirani

Professor Robert Tibshirani is Professor of Statistics and Professor of Biomedical Data Science at Stanford University and a renowned statistician. In his work, he develops statistical tools for the analysis of complex datasets, most recently in genomics and proteomics but perhaps his best contribution to statistics is the Lasso method, which proposed the use of L1 penalization in regression and related problems, and Significance Analysis of Microarrays.

His main research interests are in applied statistics, biostatistics, and data mining. He is co-author of the books Generalized Additive Models with Trevor Hastie, An Introduction to the Bootstrap with Bradley Efron, and Elements of Statistical Learning with Trevor Hastie and Jerome Friedman. His current research focuses on problems in biology and genomics, medicine, and industry. With collaborator Balasubramanian Narasimhan, he also develops software packages for genomics and proteomics.

Professor Tibshirani obtained his B. Math. in Statistics and Computer Science from University of Waterloo in 1979 followed by a Master’s degree in Statistics from University of Toronto in 1980. Tibshirani joined the doctoral program at Stanford University in 1981 and received his Ph.D. in 1984 under the supervision of Bradley Efron. He taught at Toronto from 1985-1998 before returning to Stanford.

Alison Oliver talks to Professor Tibshirani about his career in statistics, from what inspired him to join the discipline, to the challenges statisticians face today and his recent important contributions to reducing platelet waste.

1. What was it that first introduced you to statistics and what then inspired you to pursue a BA in Statistics and Computer Science at University of Waterloo before moving onto study for your MA at Toronto and PhD at Stanford?

I was an undergraduate at University of Waterloo in Canada in mathematics and computer science, and I loved math and computers. But I found that math at some point became too abstract, and computer science was useful but it was kind of removed from science. So, in statistics, which I discovered during my fourth year at Waterloo because some friends were taking it—I didn’t know what it was—it seemed like a nice compromise, a nice mix of mathematics and computer science, but also with an eye towards science. So, it was a great combination that attracted me.

2. You then taught at the University of Toronto for thirteen years. What was it that you loved about working at Toronto that kept you there?

It’s a great University and another big draw was our family. I’m from Toronto which I love and we raised our young kids there with the grandparents around. When they got older, I really felt the pull back to Stanford because it was where I did my PhD, and it’s such a great place, not only because of the statistics department, but all the other great science departments. The medical school was very attractive, because I like medical applications.

3. You are currently Professor of Biomedical Data Science and Statistics at Stanford University where you have been since 1998. What let to this move from Toronto?

It was the quality of the department and it’s such a terrific university scientifically. It’s a wonderful place to work, not only because of the statistics department and biomedical data sciences, which is a new department at Stanford which is trying to combine all the quantitative people in medicine in one place, but I also have so many wonderful collaborators in medicine at Stanford. Stanford’s a very small place, and since it’s not a huge medical enterprise like Harvard or Johns Hopkins, everybody knows each other. Probably every week I have someone wanting to come to see me with some exciting new technology that they’ve invented or started using, and I can help them make sense of the data. It’s just a wonderful, enriching and stimulating environment.

The students are also wonderful. Students not only in statistics are wonderful, but in the medical school there’s so many great students in biological sciences that are just so fearless. They’re very good at what they do in biology, but they’re also quantitatively very savvy. They soak up all the new tools that a lot of us have invented or worked on, they want to know how to use them and they then push them to their boundaries. The students are more fearless than the professors, who were trained 20-30 years ago. The students and postdocs are younger and sort of more fungible; they soak up new methods very quickly and they want to learn.

…if you just teach the results of 10 or 20 years of polished thought by the thought leaders, the students don’t get an appreciation for the idea and what went into it. So I try to get as much back and forth and for the students to ask questions, I ask them questions, so that we can sort of together constructively understand the idea rather than I just telling them.

4. Your research interests include bootstrap method, modeling, inference and your work in statistical learning hasbeen particularly prolific, including collaborations with Trevor Hastie and Brad Efron. What are you working on currently?

Well the last few years have been fantastic because I’ve focussed on post-selection inference – this idea that you do some kind of a model fit regression, or say the lasso or stepwise regression, and then you want to obtain p values and confidence intervals for the coefficient for the variables. The tools we have now, in R and other languages, basically ignore the selection. So the fact you’ve adaptively chosen the model from the data is ignored, and we sort of pretend like the model was fixed a priori. So the values and confidence intervals that are produced by current packages are all wrong, and they’re way too optimistic; the values should be much larger, but they ignore the fact that we’ve cherry-picked for the best variables.

Working with Jon Taylor and other people, we’ve come up with some new theory and tools for post-selection inference and software that, having fit a model by the lasso or say the stepwise regression, allows you to then do a follow-up analysis and it will account for the fact that you’ve adapted the chosen and give you values and confidence intervals that take into account your selection. So, they’re more honest. There are number of groups working on this—we’re one of the leaders in it.

I continue to focus on supervised learning and large-scale data analysis with a lot of medical applications. My real love is applying these tools with collaborators in medicine because it’s nice to feel like you might improve public health, so I do a lot of that as well.

5. From your teaching experience, how has the teaching of statistics evolved over the years and met the changing needs of the students?

There are new modes of teaching, like Trevor and I have an online course, a MOOC, that has been very popular, and we have this Stanford version of that course which is called a flipped classroom. The lectures are all pre-filmed, and they’re the same whether you’re using the MOOC. So, students watch the lectures on their own time, and then we meet once a week. The “flipped classroom,” is a class where we examine case studies and examples in the class rather than the lectures. So that’s sort of a new mode of teaching which has been rather successful.

Of course, the topics have evolved from when I was a graduate student: there was more of an emphasis on mathematical topics, whereas now there are new courses in statistical learning and other applied areas and computational statistics. Of course the teaching, there’s very little blackboard anymore; now it’s all done by computer and more modern ways of projecting and presenting.

6. What have been the most popular courses that the students respond to that you would recommend to others teaching?

One thing I find very useful in class is an idea I got from Steve Stigler—when he teaches about the normal distribution, the mean and the variance are independent – X bar and s are independent. Stigler says when most people teach that, they say, “Here’s a normal distribution, and by the way, here’s X bar, here’s s, and they’re independent.” But he said the way he teaches it, he says, “Here’s X bar and here’s s.” And then he spends like forty minutes telling the students there’s no way they could be independent. And then at the very end, he says, “Look what happens in the normal distribution, how special this is.” And now there’s a special appreciation for the fact that they’re independent. So instead of just telling them the results of many years of collaborative thought, give them the context and what other things might be true. So it gives them more of an appreciation.

So I try to use that trick, as well. In the section of the course where I want to introduce a method, I might ask the students, “Well how would you approach this problem? Here’s the kind of data we have. Here’s what we want to do. How would you approach it?” And then have a ten-minute conversation, the students back and forth, some students have good ideas, other students will have an idea, and another students will critique it, and it goes back and forth, and finally after ten minutes, I say, “Well those were all good ideas, and now here’s actually how we do it.” And they have now an appreciation for things that might have worked or didn’t work, and it gives them context. Because if you just teach the results of 10 or 20 years of polished thought by the thought leaders, they don’t get an appreciation for the idea and what went into it. So I try to get as much back and forth and for the students to ask questions, I ask them questions, so that we can sort of together constructively understand the idea rather than I just telling them.

7. What do you think have been the most important recent developments in the field and will these influence your teaching in future years?

Machine learning, deep learning, and deep neural networks have made a big comeback and they’re very interesting and important. I think applications of statistical modelling is another which is one topic I’m working on now. There are lots of people is the personalized medicine area. We have a project that a staffer with the people in the clinics, we call it the “patients like me.” The patient comes to the doctor’s office. He’s 55 years old with certain medical conditions, and you determine he has this disease, and the doctor wants to know what treatment to give him, drug A or drug B, but there’s never been a randomized clinical trial comparing the two. So he has to go into the electronic health records, say at Stanford, find patients that are “like” his patient, whatever that means, and then look at the experience of those patients: did they do better under A or B? Of course, there are problems with that approach. First of all, what does it mean to be “like that patient?” Presuming his hair colour’s not important to match, but other characteristics. Furthermore, in the electronic health record, those treatments weren’t randomly assigned. For example, the doctors at Stanford might have given patients –  the sicker patients – treatment B because they believed it was a more powerful drug, maybe more toxic. Therefore we can’t just compare A to B because patients under B will be sicker, so we have to model the propensity of treatment. We’re working on a system now with people in the medical school to try to do this at Stanford.

So causal inference, trying to infer causality or near-causality from observational data, is really important and hard. And as you go to other kinds of data sources like smartphones, it’s very complicated because the sampling is very biased.

8. Your lecture at JSM 2017 was an Introductory Overview Lecture on Data Science: A Collaboration between Statistics and Computer Science with Lise Getoor of University of California Santa Cruz. Please could you tell us more about this topic? What was the one thing that you wanted your audience to take away from your lecture?

I gave an overview of what statistical learning is and how it relates to machine learning and data science. I described briefly five very popular and useful methods for statistical learning. Consumer Reports is an American thing – in America, if you want to buy a car or a camera, you look at Consumer Reports and it has like a chart of all the advantages and disadvantages. That’s what I did here. So, I had a nice chart that shows all the advantages and disadvantages of all the different methods. Then I basically gave some real examples.

The one example which I’m really happy about is working with people at Stanford hospital. Stanford hospital gets their blood from Stanford blood center; they’re their sole supplier, and every day they order platelets and red blood cells. Platelets are somewhat special in that they only have a five-day shelf life. They can’t be refrigerated, and after five days they have to be discarded. Furthermore, they’re tested for the first two days for safety, so they’ve only got two days of use. So what happens now is every day at Stanford hospital a call goes through to the the blood center and says, “Hey, we need 45 units of platelets tomorrow.” A unit’s a bag, about a pint size. They base that on what they see is going to happen in the hospital tomorrow, surgeries that are planned, how many patients are around, how much inventory they currently have, but what tends to happen is they tend to over-order because they’re afraid to run short. If they run short, it’s not a complete disaster, but they have to find blood from some other blood center very quickly and it’s expensive. So they tend not to want to run short, they tend to over order. And as I say, if they don’t use the blood for three days, then they throw it away. Currently the system waste about 1,500 bags a year, which is 8% of their blood. It’s a very valuable resource. Getting platelets from donors isn’t as quick as donating blood – you spend an hour and a half sitting in a chair to donate platelets.

It’s a very valuable resource, and they waste a lot, so the blood center, through the pathology department, contacted me and said, “We have all this data in the hospital and we have a database that describes all the patients in the hospital and their current bloodwork. Could you help us build a statistical model to predict better how much we’re going to need each day?” So they gave us three years of data, and it was very messy data—you’d think hospital databases would be well organized. It was a complete mess, there was three different databases and none of the patient names match. But with some grad students’ help, we organized the data into a usable form and then we fed a lasso style model, which basically historically said let’s take the features, like the days of the week, how many patients are in all of the different wards, the CDC count, the blood counts of all the patients, use those as predictors to predict how much is going to be used the next day.

It’s not completely predictable because things might happen, like a large car accident which involves a number of people. We built this statistical model based on the lasso and we historically tested it by predicting forward in time, so we trained it for six months and then we started reusing it forward in time. And the result of this historical testing was that our model also had no shortage over the whole three-year period, and the waste was reduced by two-thirds, so from 1,500 bags to about 400 bags. So it’s quite substantial, and it was never even close to a shortage; there never was a time that actually the number of bags was less than 10.

So our collaborators are very excited. We’ve published a paper on the method, it’s being implemented into an R Shiny app in the hospital, and they’re going to use it on a daily basis. Furthermore, we’re going to distribute the R Shiny app, the algorithm, around the nation and people will hopefully use it: they’ll take the algorithm, they’ll train it using their own data and my collaborators say it could save $150 million a year in the US. It’s a nice example of statistical learning where there’s low-hanging fruit there, there’s lots of signal if you just can organize it and put it in a usable fashion. And the predictors that come out are very sensible, too, like the day of the week. There’s many more operations Monday to Friday than there are on the weekends, and the algorithm figures that out. It also figures out if you have a lot of patients in the hematology ward, where there’s a lot of blood intensive, you need platelets. It’s intelligent, and gives the data that’s required.

9. Your research has been published in journals and books: is there a particular article or book that you are most proud of?

One of the things I’m most proud of are the Lasso method and paper, and the book with Trevor and Jerry Friedman, The Elements of Statistical Learning, probably had the most impact. The Lasso method and sparse modelling are probably the most exciting developments I have worked on so far in my career.

I think statisticians have to rebrand themselves and reemphasize.

10. What is the best book in statistics that you have ever read?

It tends to be skewed towards things that you were reading as a student. So the books that had a large influence on me, especially is Generalized Linear Models by McCullagh and Nelder, which really influenced my thinking about statistics.

It’s funny: Trevor and I came to Stanford and we both came from the British system sort of; he’s from South Africa and spent time in London at MRC. I came from University of Toronto, which had sort of a British influence. Then we came to Stanford at almost the same time, and Stanford has got much more of the American, Berkeley kind of mathematical statistic. Therefore, we brought our British view of statistics to America, and that book for example, and the time, was the most prime example of it. That helped to shape both of our thinking and at the end of our studentship, we wrote the paper and then the book on generalized additive models, which clearly came directly from linear models.

11. What do you think are the greatest challenges facing the profession of statistics in the years to come?

Probably maintaining ourselves as a scientific discipline. I mean, the developments in data science are exciting; the danger is that statisticians are starting to be, or statistics departments might become marginalized. I’m not really so worried about the major departments, but if I was a department in the Midwest, I’d be worried, in the department of statistics. A data science department might pop up at that university, and then if you’re a donor, where are you going to give the money: to a statistics department, which sounds like a dry old topic, or data science, which sounds like a modern exciting thing?

So I think statisticians have to rebrand themselves and reemphasize: there are certain aspects of statistics that are always going to be important, like design of experiments. Design of experiments, Google now calls that “A-B testing.” But that’s something which Fisher invented in 1920! But now computer scientists think they’re reinventing a fair amount of design. Confounding! All these things which are important statistical concepts—no matter how big your data is, you have to worry about and understand basic statistical concepts. So we have to rebrand ourselves to emphasize those concepts.

One thing we’re not good at is PR, compared to, like, computer science. So we have to get better at advertising our skills and emphasizing the importance of our subject, while at the same time embracing new ideas. If not, statistics will kind of get maybe pushed to the side or absorbed into some other area. That’s my worry.

12. What would you recommend to young people who want to start a career in statistics?

You need mathematical skills, and it’s hard to learn math when you’re older. Mathematics as a first degree is a wonderful way of learning how to think. Mathematics is a way to reason. So people who start off in mathematics almost always succeed no matter what they end up in. So I would say learn a good amount of math early.

Some of the most successful people in statistics, Jerry Friedman, who was a physicist, still says that he doesn’t know much statistics. He sometimes comes to ask Trevor and I the most seemingly basic question about statistical theory. He never took it! But it’s been an advantage, because he thinks outside the box. He’s a problem-solver. He’s not interested in someone’s deep theory of statistics, he wants to solve the problem. And that might not involve mathematical theory. So I think you need a mix of mathematics and practical scientific experience.

And I think, to me the real key is always talk to scientists and work on real problems for some percentage of the time, because it really keeps you grounded. And a lot of my ideas have come from questions that are asked to me by a biologist who said, “We have this data, could you help me?” And I look at it and say, “Okay, I’ll come up with a rough solution that might be fine for now,” but then I realize that’s a really interesting problem. I can see how that’s probably a general problem, and there’s no good solution, so then I would go back and my graduate students and I will work on that and maybe come up with some new statistical approach and turn it into a paper, but I have never would have thought of even the problem without talking to someone. And I think Brad Efron told me a long time ago that he used to sit at his desk and dream up problems to work on. And then maybe later in the 80s, I don’t know why I ever did that! Now he’s more like me: he talks to people, he talks to scientists, and the ideas come from outside. There’s a role for both, but more and more, I think talking to scientists is really important. After all, to me, that’s what our field is about: we’re scientists. We’re trying to help scientists.

There are people who do statistics for its own sake, but if I don’t see any scientific application potential, I don’t really get excited about it, because I want to be a scientist. I want to help scientists understand things better.

13. Who are the people who have been influential in your career?

Starting off in Toronto, David Andrews is my Master’s advisor and mentor, Paul Corey in Biostatistics, and then at Stanford Brad Efron and Trevor of course. Jerry Friedman. I have more junior collaborators now: Jonathan Taylor has been the main force in selective inference. Andreas Buja was actually was a big influence; he came through Stanford my first year here on his way to University of Washington. He taught us a course about modern statistics, our first-year class, which included Art Owen, a number of people who have gone on to good careers in statistics. He learned it from Peter Huber, and it was a wonderful course. And actually, the set of notes he gave us loosely formed the basis of the book with Trevor. He set the foundation for our thinking about the subject so Andreas gets a huge amount of credit and inspiration.

14. If you had not got involved in the field of statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)

I’d say a doctor would have been a definite possibility.


Copyright: Image appears courtesy of Professor Tibshirani