“It’s important for Statistics to maintain its identity”: An interview with Alan Gelfand

Alan Gelfand is the James B. Duke Emeritus Professor of Statistics and Decision Sciences at Duke University and is best known for his significant contributions to the fields of Bayesian statistics, spatial statistics and hierarchical modeling.

Born in the Bronx, New York, he attended City University of New York (formerly City College of New York), , followed by Stanford. His first teaching post was at the University of Connecticut where he remained for 33 years until joining Duke. His growing interest in Bayesian statistics led to the publication of Gelfand and Smith [J. Amer. Statist. Assoc. 85 (1990) 398–409], the paper that introduced the Gibbs sampler to many and revolutionized Bayesian computing. During the 1990s, his research interests turned to spatial statistics, modelling and model determination.

Alison Oliver talks to Professor Gelfand about his career in statistics, his collaboration with Sir Adrian Smith, his secondary appointment as a Professor of Environmental Health and Policy at Nicholas School of the Environment and his thoughts on Big Data and data science.

thumbnail image:

1. What was it that first introduced you to the discipline of statistics and inspired you to pursue it as a career?

You know it’s a really interesting question, because I stumbled onto this not actually ever expecting that this would be a path for me. I was supposed to be a doctor—that’s what my mother told me—but I couldn’t ever become a doctor. I remember doing a dissection in a biology class and I couldn’t recognize the difference between a spleen and a piece of bubble gum. I tend to have more of a conceptual mentality, and I’m not so fact-based, which is what medicine is like.

In any event, what happened to me is that I wound up studying mathematics as an undergraduate, and in my junior year I took a statistics course and found it really interesting. It was essentially the first mathematical/statistics exposure, which, at least back then, was typically the way you’d get introduced to statistics. Now statistics is so much broader that it wouldn’t necessarily happen that way but back then, I got excited about that course and applied to several graduate programs in statistics.

Also, you’ve got to remember, I’m an old geezer, and so at that point I was trying to avoid the Vietnam War. It was the late 60s and I wanted to go off to graduate school instead of being drafted. Even then I was rather naïve, because a lot of my fellow students there were really passionate about statistics, and I was sort of there trying to figure out, “Well what am I going to do?” When I was there the process became, “Oh hey, I could really fall in love with this,” and “Hey, a life in academia is really an attractive path for me.” So, it happened in a somewhat odd way for me.

2. Once you had established a career, what kept you going? What passion motivated you to continue?

So there again is an unusual story. I was trained to be a mathematical statistician which meant a lot of proving theorems. I could live in that world, but it was not really my strength, and what happened eventually was that I discovered Bayes, probabilistic modelling, uncertainty, and hierarchical multilevel specification. And, I realized this was where I was going to be able to contribute. I’ve often said I had a wasted youth, because I spent twenty years in the community trying to be more of a mathematical statistician when, in fact, I was really meant to be a stochastic modeller.

So, it’s actually wonderful that our discipline is broad enough that you can find a life as a theoretician, you can find a life as a modeller, as a pure data analyst, as a computing specialist, as a visualization person—there are lots of different ways you can contribute in our community. For me, fortunately, modelling, building complex models for complex processes, is where I’ve found myself. It’s been a wonderful ride, an absolutely fantastic path. But it took me a while to get there!

3. You have been a visiting professor/researcher at many prestigious universities around the world. Have you noticed any interesting commonalities or differences in statistics research and education globally? Are there any valuable takeaways that you wouldn’t have discovered by staying in one place?

A really good question, and I think what’s happened—and it’s obviously critical for Statistics—is that there’s been a substantial change in perspective. That is, Statistics went from sort of methodology in search of application to now doing work that is purely driven by application. It used to be in the old days you could close the door and sit and work in your office. Now you actually do much more in terms of teamwork, it’s more collaborative and interdisciplinary.

I think that’s the primary change, and I think it’s really wonderful for Statistics because it gives Statistics a relevance that, for example, some mathematicians just wouldn’t enjoy (and they may be happier to live without!). But for me, there’s no field that’s more naturally interdisciplinary than Statistics. Everybody gathers data of some sort these days. Everybody needs to analyse data and they are all studying complex systems. Everybody can benefit from relationships with statisticians.

I was an early adopter in this regard; I started doing this even in the early 90s, and then you really saw a sea change, maybe by the turn of the millennium. It’s really been wonderful to watch, and just think about the diversity of applications. My area is environmental science, but there are people in computational biology or social networks, all sorts of medical work, the neurosciences—I mean you can go anywhere. But the point is I think you get to be a richer statistician, a richer scientist in this fashion, because you combine some of your statistical skills with an exposure to an area where, you don’t necessarily become an expert, but you become reasonably adept with at least some parts of the literature and some understanding of the processes. For me, I can feel good about being green. Altogether, the bottom line for your question, is that I think there’s a general change in the field that has really been healthy and given Statistics vitality in the twenty-first century.

4. You also have a secondary appointment as a Professor of Environmental Health and Policy at Nicholas School of the Environment. Tell us about your interest in this area: when did it start and how can statisticians make a difference to environmental progress?

There are roughly two areas here: one is more ecological, and one is more environmental, and I’ve actually enjoyed building bridges in both areas. The ecological bridges started with John Silander at University of Connecticut who is a wonderful thinker, full of ideas, who looks at situations and imagines all sorts of attractive ways of trying to understand various ecological processes, particularly species distribution models. When I went to Duke I met Jim Clark who is one of the finest ecologists in the world, and again, it was a bridge that we built as soon as I got there and has continued to flourish today. Those two guys really introduced me to species distribution modelling and to species demography.

The environmental bridge sort of grew out of some connections I started making at the Environmental Protection Agency. It turns out that in Durham, where Duke is, there’s a very large office of the U.S. Environmental Protection Agency and I made a connection with David Holland. He’s one of these guys who has a PhD in statistics but is more of an idea person in terms of what statistical modelling could do with regard to helping the EPA better understand environmental exposures, and I thought, “This is pretty attractive for me.” I mentioned that a lot of the earlier spatial and spatio-temporal work was not particularly model-driven but the idea of actually building models to capture particulate matter exposure, ozone exposure, components of PM exposures, or to even try to link these to adverse outcomes, seemed like something you could feel good about doing, so that was an equally strong drive for me.

5. Some of your areas of interest and immense impact has been decision theory, empirical Bayes, spatial statistics, and more recently environmental science. What is currently your strongest interest in the field of statistics?

July 31st 2018 was the end of my career. I’ve been doing this for 49 years. I could have gone fifty, I suppose, but 49 seemed like the right number. Retirement is not the issue, because I’m not quite dead yet. Still I have ideas and collaborations so life has not changed things dramatically. I live a transatlantic lifestyle, my wife is in Spain and I’m in North Carolina, so we do a lot of transatlantic travel and work hard to spend a lot of time together.

The things that I think are really driving me now are what models should really look like for the complex processes we study these days. I am concerned that too many people nowadays are taking off-the-shelf sorts of tools, and this is particularly a criticism of data science and big data. In particular, I actually think that there’s a lot more to thinking needed about multi-level modelling, rather than throwing whatever bag of tools you have into the mix, in order to help understand things. Let’s say, in order to predict the weather today, I can use yesterday’s weather and tomorrow’s weather. Well sure you could do that. I guarantee you’ll predict better if you have tomorrow’s weather and yesterday’s weather then if you just use yesterday’s weather . . . but is that generative? No, that’s just kind of playing a game, isn’t it? You can retrospectively do that—sure, give me a whole sequence of data, I could do that. I’m not saying that some of the work is as blatant as that, but there’s a lot of this naivete about what an explanatory model might look like. Somebody is often saying, “But look how much better I predict with this model,” or “Look how much better I explain with this model.” But, again, for me, I wind up scratching my head thinking, “Yeah, but that’s not the way the data came, and so I don’t see how you can argue that’s the way you can predict in this model.”

6. In your opinion, where does Statistics currently have the greatest chance to make an impact? Are there any neglected areas that would benefit from attention by statisticians?

Statistics—and I’m sure you’ve heard this many times—is somewhat threatened by data science, because data science wants this big tent, and they want to put Statistics under the tent along with Informatics, Applied Mathematics, Computer Science, amongst others. But, in order for Statistics to retain its identity, it needs to fall back on what it was built on, which is inference. In particular, in that regard, where statisticians have always made a contribution is through hypothesis-driven research, as opposed to the big data perspective, which is often just ransacking a database trying to extract structure. I’m not saying that there aren’t problems where that is useful, but I am saying that where Statistics will have its future is going to have to be in hypothesis-driven research and research. Specifically, where people think about processes and hypothesize behaviours, interactions, structures or whatever features of the processes, and try to use that to try to build models to capture them and to understand them.

So, for me, that’s hopefully where the future of Statistics is, and if Statistics goes too all-in for data science, I think it’s going to lose something. It’s important for Statistics to maintain its identity. Not every dataset is terabyte size. Not every dataset is petabyte size. Not every dataset is enormous. There are lots of datasets that you can do interesting things with that are order of ten to the third, ten to the fourth, ten to the fifth—still big enough to be interesting. What I like is to work with datasets on that scale and I still think that you can extract good stories from them.

7. Your lecture at ISBA 2018 focussed on Spatial Statistics and Environmental Challenges. What was the one thing that you would like your audience to take away from your lecture?

Statisticians are becoming more and more applied, and so we build teams and we use applications to motivate the work that we do. So, based on that, my takeaway message is: okay, there’s excitement in environmental sciences, and there’s funding for work in environmental sciences. There are job opportunities but it’s not necessarily working for Google or Microsoft; instead it’s a different kind of community.

For me it certainly generates passion. You continue to find wonderful little nuggets. I was describing in my talk about joining a project on whales and how whales are reacting to sonar and other sort of navy ships’ noises and how is it affecting whale behaviours? Is it affecting them in harmful ways? So, it’s quite an interesting challenge to try and take signals that are located in different places and how they are reaching whales, how strong are these signals, and in what direction. Take a complicated input like that and try to convert it to a behavioural response, which could be anything from turning, from diving, from feeding or not feeding, sleeping or not sleeping. So altogether it’s really a very challenging problem, but I think it’s important, in particular if the U.S. government wants to invest money to understand what the impacts of the ships are and where they’re locating them. In the old days, things were easy: you’d get a plot of seeds, and you’d say, “This one was treated with this, whereas this one was treated with this; how much did this one grow, etc?” All these experimental designs and a lot of simple regressions. I’m not saying there’s no place for them, but in my landscape, more complicated, more ugly, is actually better as it’s more challenging.

…if Statistics goes too all-in for data science, I think it’s going to lose something. It’s important for Statistics to maintain its identity. Not every dataset is terabyte size. Not every dataset is petabyte size. Not every dataset is enormous. There are lots of datasets that you can do interesting things with that are order of ten to the third, ten to the fourth, ten to the fifth—still big enough to be interesting. What I like is to work with datasets on that scale and I still think that you can extract good stories from them.

8. You have authored more than 280 papers and received awards for your research contribution to statistics. Is there a particular contribution that you are most proud of? Or perhaps one that took your work in a new/different direction?

I was very proud of the paper I wrote with Adrian Smith, ‘Sampling-Based Approaches to Calculating Marginal Densities’ which was published in JASA in 1990. There are a lot of really smart people that just don’t happy to get lucky, but we got lucky and found a foundational, seminal idea and were able to publish it, and it has obviously changed Bayesian statistics dramatically.

I know that Dennis Lindley, who was Adrian’s advisor, predicted the 21st century would be a Bayesian century, but he predicted it because he thought people would just realize the paradigm was scientifically the right one. What’s happened is, because of Gibbs sampling and Monte Carlo, the ability of fit models that were previously inaccessible became available, and that’s why people have gravitated much more toward Bayesian modelling. Because we have tools now to outstrip what you can do with classical modelling, and that’s why it’s becoming more of a 21st Bayesian century. People are realizing that you want to build these complex models and the Bayesian framework is really the only appropriate way to fit and infer with these models.

I also feel good about contributing towards Bayesian spatial statistics. I think I was sort of there at the beginning, because I looked at what was out there and realized most of it was GIS-based, and that meant very pretty maps, schematics, overlays, lots of nice eye candy we could call it, but none of it was formally model building. Some of it had probabilistic structure, none of it had inference associated with it, and so from my point of view I saw just a wonderful opportunity to take the kinds of problems that GIS was exploring and put a full inference engine on top of them. And so that’s what I’ve been doing for more than twenty years. And hey, it’s gone pretty well! Produced a lot of students along the way and a lot of papers.

11. You have earned many honours and distinctions during your career. Is there one that you are particularly proud of?

You know, it’s interesting: you do get to different points in life where you receive this recognition or that recognition. But if you asked me how I measure things, I actually feel pretty good about having almost 35,000 citations. That’s actually not too bad. It means people have looked at my work and said, “Okay, I’ll cite that.” I’ve got an h-score of 75, which is pretty high in our field. I’m very honoured to have received awards and invited lectures but I think recognition of the body of the work through citation is something that I feel most pleased about.

12. What is the best book in statistics that you have ever read?

I confess that I’m actually more of a reader of journal articles. However, in the formative years, the books that I was interested in were books that helped me build my probability side and my inference side. I’m not a great book reader, but at various stages in life, books become important to help you get a better handle on research areas.

There’s a beautiful probability book, An Introduction to Probability Theory and Its Applicationsby William Feller, published by Wiley. He has two volumes on probability, and the first volume is just so elegant. I remember going through it and thinking, “Wow, what a work of clarity and thinking he had when he wrote that book.”

Linear Statistical Inference and its Applications by C. R. Rao on statistical inference, also a Wiley book, was a very nice presentation, sort of the state-of-the-art of inference in the late 60s. Then, I started moving into decision theory for a little bit, so Jim Berger’s book, Statistical Decision Theory, published by Springer was a landmark book, was certainly influential. With the beginnings of spatial statistics for me, I read a lot of Noel Cressie’s Statistics for Spatial Data, because in the early 1990s – it came out in ’93 – it was sort of a bible, there wasn’t anything like it. It was certainly influential.

13. What would you recommend to young people who want to start a career in Statistics?

I think this is a really critical question, because like I said, when I started out, it was a simpler world, it was a smaller world, it was a mathematical statistics world. Your path was sort of driven by the research area you wrote your thesis on and there were traditional areas: sequential analysis, biometric statistics, multivariate analysis, time series, these sort of classical areas in statistics. And if you prepared the thesis in one, that sort of became your research area, at least for a while. If you were flexible maybe you moved on; if not, you might have done it for fifty years, who knows?

But now the field is so much broader that I think it’s hard to find a place in the field now. It’s hard to find an identity in the field now, because there’s so much interdisciplinary work, so many forums out there to publish things in, and so many different paths to follow. I don’t think it’s so easy anymore to give simple advice. So the only advice I ever give is passion.

You’ve got to follow what you get excited about. Most of us in academia do what we do not as a nine-to-five job, not just for a pay check, we do it because we find something that really inspires us, it gives us fire, it pushes us.

Find what you can do really well. That’s another way to say it. In a sense that, as I mentioned earlier, there are all these different ways you can contribute in the field, and the worst thing to do is to try to make contributions where you’re not that good, as opposed to making contributions in a dimension that you’re better suited to. That’s what my story was before in terms of mathematical statistics versus stochastic modelling. If I spent my career doing mathematical statistics, you wouldn’t be talking to me today! But, because I found stochastic modelling, I was able to turn that into a much more successful career for me.

14. Who are the people who have been influential in your career?

Herbert Solomon, who was my thesis advisor, was an absolutely wonderful guy and very much under-appreciated. He was at Stanford, and he took me under his wing and shepherded me through the early stages of my academic life. He became a bit like a second father to me. He was one of the very first in Statistics to appreciate the value of pursuing external funding. So, he built relationships with a number of federal agencies and was the most successful in terms of obtaining external funding of any statistician in the world for his time.

The second name I’m going to mention is Adrian Smith—Sir Adrian, of course now. He was at Nottingham at the time I connected with him, which was a long time ago. I wrote to him and said I’d like to do a sabbatical and I’d like to play with your software. I got there and I started to play a little bit with it but then we found this Gibbs sampler paper by Stuart and Donald Geman, and it just opened up the world for me. And, as I said at the outset, it’s been a close to thirty-year ride now with this, and again, it all stemmed from that sabbatical in Nottingham with Adrian, which I will always be very grateful for.

References:

1. ‘A Conversation with Alan Gelfand, Statistical Science, 2015, Vol. 30, No. 3, 413–422 https://arxiv.org/pdf/1509.03068.pdf

 

Copyright: Image appears courtesy of Professor Gelfand