“Statistics is an evolving field, rather than a fixed toolbox”: An interview with Alan Agresti on the book he is proudest of as Wiley publishes its third edition

Wiley is proud to announce this month’s release of the third edition of Categorical Data Analysis.

The use of statistical methods for analysing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.

Categorical Data AnalysisThird Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data.

Here Statistics Views interviews the author, Professor Alan Agresti, Distinguished Professor Emeritus in the Department of Statistics at the University of Florida about his career, the book’s success and what readers can expect in this new edition.

1. When and how did you first become aware of statistics as a discipline and how did your educational background reflect this?

Like many in my generation who became statisticians, I majored in mathematics as an undergraduate at the University of Rochester but had no idea what I would do with it. I took courses in probability and mathematical statistics in my last year as an undergraduate. I enjoyed them, finding out that statisticians use math skills in real-world applications, so I went to the University of Wisconsin for graduate school. The young department there had a great mix — well-known statisticians such as George Box and Norman Draper, but also young hot-shot recent PhD graduates of Berkeley and Stanford.

2. You continue to teach as Distinguished Professor Emeritus at the Department of Statistics at the University of Florida. As a university professor, what do you think the future of teaching statistics will be? What do you think will be the upcoming challenges in engaging students?

Since I retired at Florida, I’ve been teaching each fall for the Statistics Department at Harvard University, most recently a graduate-level course on linear and generalized linear models (perhaps my next book?) which has been a fun new challenge. In the future, challenges for instructors of statistics will be the same as I’ve seen throughout my career – getting students interested in a subject they usually take because it’s required for their major but which they think will be boring and difficult. We who teach them need to illustrate John Tukey’s remark that learning statistical methods enables one to play in everyone’s backyard. It’s imperative to show them how relevant statistical thinking is to their everyday lives. Over time, there’s been a welcome change in such courses in focusing on concepts rather than formulas and recipes. Modern technology helps — software relieving computational drudgery, and applets and on-line sources helping to show difficult concepts such as sampling distributions.

3. What is your current research focussing on? What are your main objectives and what do you hope to achieve through the results?

Not surprisingly, as usual I’m working on new methods for categorical data. Recently, though, I’ve been spending more time on writing books. I’m on the lookout for new challenges, and I recently completed one by editing a book with Xiao-Li Meng of Harvard on the history of the long-established Statistics and Biostatistics programs in the U.S. (Strength in Numbers: The Rising of Academic Statistics Departments in the U.S.).

4. Congratulations on the third edition of Categorical Data Analysis which has been referred to in reviews by professional statisticians as ‘the bible of categorical data’. How did you first become aware of categorical data analysis and what led you to make use of it within your work?

After I completed my PhD, I was hired at the University of Florida to develop introductory statistics courses for social science students. The graduate students I advised had lots of categorical data, but I had no idea how to analyse them; my thesis had been in stochastic modelling, and I’m embarrassed to say that I barely knew what Karl Pearson’s chi-squared test was. As I learned more about the methods, I started doing my own research in that area. It worked out well for me, as the area was in the prime of its development.

5. Did you anticipate the success the book would have? Had you seen that there was a gap in the market for this book?

I was hopeful, as indeed there did seem to be a need for such a book. But I’ve been absolutely delighted by its success. Perhaps the best by-product has been the invitations I’ve received to visit departments and attend conferences around the world and teach short courses from the book. I’ve done that in more than 30 countries and met many wonderful people because of it.

6. With the release of the third edition, what can the reader expect in this new version?

I re-wrote every chapter, while adding material on the Bayesian approach to many chapters, as that was largely lacking from the earlier editions. I also added discussion of some non-model-based methods such as discriminant analysis, classification trees, and clustering methods. The book was already long and I did not want it to be much longer, so I’ve moved material about software to the website for the text (www.stat.ufl.edu/~aa/cda/cda.html), where I’ve been able to expand on how to use R and SAS for the text examples.

7. You have authored many publications such as three titles for Wiley on categorical data. Is there a particular article or book that you are most proud of?

I’m most proud of Categorical Data Analysis and it’s special to me because of the connections it’s enabled me to make. I’m also proud of the books I’ve written for more general audiences (Statistics: The Art and Science of Learning from Data and Statistical Methods for the Social Sciences), because of the importance of helping others appreciate Statistics as a field – and, because I met my wife due to her taking a course from one of these books! Of research articles, by far the most commonly cited is one I wrote in 1998 in The American Statistician with Brent Coull (my then PhD student, now a professor at Harvard) on a simple way of improving the commonly-taught confidence interval for proportions. I should add that I’m also exceptionally proud of many excellent PhD students I’ve been lucky to advise and of the articles that came out of our joint research.

8. Statistical software has widely developed over the past few years and is now used worldwide on a daily basis. What do you think it is about such software, such as R that has been so appealing to statisticians today and where do you see this software developing in the future?

It is now so easy to do things that were inconceivable when I was a student. R will continue to expand its influence as functions are developed for new methods, but there will always be a place for simpler software that’s easier for non-statisticians to use. It’s also exciting to see new software that makes it easier to visualize data in a non-static and interactive manner (Hans Rosling’s on-line talks, such as at TED, being a good example).

9. Do you have any advice for students considering a university degree in statistics?

Do it! You will probably end up working in areas quite far removed from what you study now, but this degree will give you the background to analyse quantitative information in ways that will make your work appreciated and potentially influential.

10. Do you continue to get research ideas from statistics and incorporate your ideas into your teaching? Where do you get inspiration for your research projects and books?

Yes, even in introductory courses, it’s important to explain how Statistics is an evolving field rather than a fixed toolbox that’s been around for centuries. I find that students are interested to hear how questions that I’ve encountered as a statistician have led to new research projects. One benefit I had from writing my categorical books is that, in developing my own overview of the field, I often learned about particular areas that needed further research work.

11. What has been the most exciting development that you have worked on in statistics during your career?

That’s a tough question. Overall, I’m just pleased to have been a contributor to the development of methods for categorical data analysis, achieving more than I could possibly have imagined at the start of my career when I was quite nervous about whether I had the capability for an academic career.

12. What do you think the most important recent developments in the field have been and will be in the future?

It’s wonderful to see the impact that statistical modelling is having in many areas, from genomics to the financial industry to its key role in medical clinical trials. In the future I’m sure it will be important in areas we cannot currently visualize. For example, the entire field of survival analysis has largely developed since I was a student. Many current challenges deal with the massive data sets that are common in many areas of application.

13. Here in the UK, the Royal Statistical Society and the Office for National Statistics are endeavouring to increase the public’s awareness of the use of statistics in every-day lives. Do you agree that this is the main challenge currently facing statisticians today?

Absolutely, and this is one reason it is imperative that we do a good job teaching the introductory statistics course, as that is our best chance to illustrate the importance of our field to a diverse audience. It would help to have more highly visible successes, such as Nate Silver’s recent successes in predicting elections in the U.S.

14. Are there people or events that have been influential in your career?

I’ll mention one person and one event. My PhD advisor at Wisconsin was Stephen Stigler, the “historian of statistics.” He was incredibly generous with his time and very encouraging during a period when I often questioned what I was doing; the Vietnam war and other societal changes in the late 1960s made it difficult to focus on a future career. For an event, I was so lucky that for my first sabbatical I spent a year at Imperial College, London. This is when I first heard about generalized linear models and started serious research in categorical data modelling. IC was a stimulating place, largely because Sir David Cox was there and so there was a continual stream of visitors. That year London became my favourite city (I even bought a flat there a few years later), one that my wife Jacki and I return to every year. That year I made many friends including statisticians such as Phil Brown and Bianca de Stavola whom I’m delighted to see whenever I return. But finally, I should mention that the UK has always been special to me, as my mother is from Gloucestershire and I’ve spent time in the Forest of Dean nearly every year of my life.

Copyright: Photograph appears courtesy of Professor Agresti