Trevor Hastie is one of the world’s leading statisticians best known for his contributions in the area of applied statistics, including machine learning, data mining, and bioinformatics.
He was born in South Africa in 1953 and graduated from Rhodes University in 1976 with a degree in statistics, and gained his MA from Cape Town University and PhD from Stanford. His first employment was with the South African Medical Research Council in 1977, during which time he earned his MS from UCT. In 1979 he spent a year interning at the London School of Hygiene and Tropical Medicine, the Johnson Space Center in Houston Texas, and the Biomath department at Oxford University. Professor Hastie joined the PhD program at Stanford University in 1980. After graduating from Stanford in 1984, he returned to South Africa for a year with his earlier employer SA Medical Research Council. In 1986, Hastie joined the statistics and data analysis research group at what was then AT&T Bell Laboratories in Murray Hill, New Jersey, where he remained there for eight years before returning to Stanford in 1994 as Professor in Statistics and Biostatistics. In 2013, he became the John A. Overdeck Professor of Mathematical Sciences.
His main research concerns statistical learning and data mining, statistical computing and bioinformatics.
1. What advice would you give to students considering a university degree in statistics or to graduates looking to develop a career in the field?
Do it! It’s a very good time to do it. Once you have decided to do it, I would strongly encourage that you go the applied route. Statistics is a field that exists amongst the other sciences as a service tool. It really makes sense to do consulting, get real data and work on projects as they arise in the context of some analysis that you are doing. I have done consulting my whole career and every second consulting project will have aspects for which there is no traditional method. You have to come up with some tweaked approach. It’s a fertile way of coming up with research problems. Computing skills are really important, especially these days as datasets are becoming larger and unwieldly. The field is hot right now. Someone who can claim to be a data scientist gets a job in a flash.
2. Is data science a sexy word for statistics?
I do think that data science is a sexy word for statistics, for applied statistics certainly. Some statistics departments have a reputation for being very theoretical. I think applied statisticians have been essentially doing data science their whole careers. The computing part has become ever more important as the datasets have become bigger and more complex.
3. Data collection has grown too as well as computing. How do you see the two working together in processing statistics?
In the past you were presented with a data set where someone had taken the trouble to work on the data and encode it, whereas these days it often arrives via machines and automatic recording devices, and humans have not had much of a role in entering the data. It can come in messy forms that need to be processed before use in some kind of analysis, which all requires new skills that you need to learn in order to become more effective.
4. How have you seen computing evolve over the years?
I hate to admit it but I started learning computing by using punch cards in South Africa in 1973 when I was an undergraduate at Rhodes University. Rick Becker, a colleague of mine from the Bell-Labs days, mentioned at useR! 2016 that there were tons of data in those early days when they worked on records of phone companies, and he was able to dig in and extract and create Big Data pipelines to be analysed. I think it has been going on all the time. At Bell Labs, they were better than many other places, but it’s clear these days that many young applied statisticians are being trained to tackle Big Data from the start.
5. Is there a place like Bell Labs now?
There are but they are scaled down. AT&T research is much smaller now, but they did win the Netflix recommender-system competition. Microsoft has a great research lab. Others have on a smaller scale such as IBM, but in my mind Microsoft is considered the present-day version of Bell Labs.
6. Over the years, how has your research, teaching and consulting motivated and influenced each other? Do you continue to get ideas from statistics and incorporate them into your teaching?
I’ve done consulting all throughout my career and when I teach, I like the students to have hands-on experience with data and computing. Often I teach first year PhD students, and if they have not learned computing on entry, they certainly have upon completion of the classl. I find consulting very important. Rob Tibshirani and I started the statistical consulting service in this department as graduate students in 1983. This was a free service and we had drop-in hours so people could get help, and that still happens today. We have our students do consulting once a year so that they get used to working with researchers and their data and solving real problems.. I get called upon to work on projects from the medical school and I typically involve my own PhD students in these projects. It’s a really good learning process and a chance to be creative, rather than using textbooks.
7. Over the years, how do you feel the teaching of statistics has evolved and adapted to meet the changing needs of students?
Teachers tend to change slowly but definitely the whole playing field has changed. Students will give up on us if we don’t adapt, so we have to learn new things that are interesting for the students. We change but not as fast as I think we’d like.
8. What do you see as some of the greatest challenges facing the profession of statistics in coming years?
A lot of statisticians feel that data science will sideline them. It is a hot field and many different groups want a piece of the action, like computer science for example. There is a threat that statistics will be pushed aside, but I don’t believe this will happen. We have had threats like this before, such as from machine learning. However applied statistics was not pushed aside, it turned out that statisticians had a lot to offer in terms of the understanding and refinement of the machine learning algorithms in terms of statistical models. Data science is intrinsically applied statistics with a heavy load of computing. We will move along with the tide.
9. What is the best book on statistics that you have ever read?
That is a tough question. I’ve been thinking about this for a long time and I’ve decided to settle on Brad Efron’s monograph on the bootstrap called The Jackknife, the Bootstrap, and other Resampling Plans. That book came out in the early 80s when I was a graduate student and I took his class and read the book. He has a beautiful style of writing. It was very innovative at the time and it has always been a standout book for me. But of course, there are many other books that are fantastic.
10. Who are the people who have been influential in your career?
There’s a lot! I have to mention Rob Tibshirani first. We have been colleagues for years and have worked together on so many projects. My advisor, Werner Stuetzle was very influential, Jerry Friedman was a mentor, as was Brad Efron, Leo Breiman, Andreas Buja, Daryl Pregibon and John Chambers. And earlier there was John Nelder, Michael Healy, David Cox and Peter Armitage. I hope I haven’t left anyone out!
11. What do you think you would have done if you had not got involved in statistics? Is there another area you could see yourself as having an impact on?
If you had not asked that question about having an impact, I might have said ‘a plumber’ because over the course of the years, I have worked on a lot of home renovations and I absolutely love plumbing! It is the most satisfying thing! You start a job, you finish it and you’ve got the results in front of you, whereas research may not have so nice an ending.
I think like many statisticians, I started off in this field because I was good at math but I wanted to do something applied. So I think electrical engineering could have been another outlet – a branch of electrical engineering that involves applied math. I have colleagues who work in convex optimization. I think it’s a field I could have flourished in as well. It’s just serendipity that I ended up in statistics.
Copyright: Image appears courtesy of Professor Hastie