The single most cited biostatistics book in print: Co-author David W. Hosmer on the success of Applied Logistic Regression

David W. Hosmer is Professor Emeritus at the Department of Public Health at the University of Massachusetts as well as Professor of Statistics at the University of Vermont. He is the co-author of the best-selling Applied Logistic Regression with Stanley Lemeshow and Rodney X. Sturdivant, which is currently in its third edition.

Dr. Hosmer remains active from his home base in Stowe, Vermont, teaching short courses on logistic regression and survival analysis as well as with biostatistical and applied research with University of Massachusetts colleagues and others around the world.

Statistics Views talks to Dr Hosmer about the book’s success and his own career in statistics.

thumbnail image: The single most cited biostatistics book in print: Co-author David W. Hosmer on the success of Applied Logistic Regression

1. When and how did you first become aware of statistics as a discipline and how did your educational background reflect this?

I was a mathematics major at the University of Vermont and in my junior year, I took a statistics course from Professor David Sylwester that included applications as well as the mathematics behind the methods. I was hooked.

2. What was it that inspired you to pursue a career in statistics?

I had no clue about what I really wanted to do when I graduated so I stayed at Vermont for a Masters degree in mathematics. During this period I figured out that I really liked the combination of mathematics and applications that studying statistics offered. Professor Sylwester encouraged me to go on for a doctorate and pointed me to a program at the University of Washington in Seattle. On a number of levels, the two years in the Masters program at Vermont and the three years in the Ph.D. program at Washington set the tone for my career and as well as much of my life outside of statistics.

3. You are Professor Emeritus at the Department of Public Health at the University of Massachusetts as well as Professor of Statistics at the University of Vermont. As a university professor, what do you think the future of teaching statistics will be? What do you think will be the upcoming challenges in engaging students?

I have Emeritus status the University of Massachusetts but have not taught there since I retired in 2002 and moved to Stowe Vermont. I teach a combined logistic regression and survival analysis course at the University of Vermont every other spring semester. Most of my teaching is in short courses of from one to five days duration. I try to limit these to a few a year that, hopefully, are in places I like to visit, for example, Norway.

My wife and I are avid cross-country skiers and still actively compete at the masters level. My wife was on the first US Women’s Cross Country Ski Team and competed in the 1972 Olympics in Sapporo, Japan. We met in the Masters program in 1966 at Vermont. In 1988 I arranged for a January to April visit to the University of Oslo as part of a sabbatical leave. I wrote a good portion of the first draft of the first edition of Applied Logistic Regression in Oslo that winter and my wife, our two children and I did a lot of skiing. During the visit I met Petter Laake and he has become a great friend and colleague. He has arranged, through the Department of Biostatistics at the University of Oslo, a short course on logistic regression nearly every year since 1988. This past January was the 25th visit my wife and I have made to Norway, combining teaching and a lot of cross-country skiing.

I left active classroom teaching just as all the current electronic gismos were making their way onto campuses and into classrooms. I’m not sure how I would react or handle the fact that students are by and large constantly distracted. Engaging them it would seem to me is the challenge.

Big data seems to be the current rage and it presents many important and real challenges to statistical methods that were by and large developed for modest sized problems. In my own work with large data sets, the issue is ferreting out the clinically meaningful from amongst the myriad of effects that are significant due to large numbers of observations. But these data sets are not “Google” big.

4. What is your current research focussing on? What are your main objectives and what do you hope to achieve through the results?

My personal research is winding down at this point. When I get ideas for new statistical projects I usually suggest them to two of my former doctoral students and now co-authors, Susanne May and Rod Sturdivant. Morten Fagerland, Petter Laake’s son-in-law, is a colleague and we have done and continue to do joint work on extending the Hosmer-Lemeshow test to the multinomial logistic and three ordinal logistic models.

The Menzies Research Institute at the University of Tasmania is another place that my wife and I have visited, four times now and one visit was for nine months. The nature and hiking in Tasmania are spectacular. During these visits, Professor Leigh Blizzard and I have become good friends and colleagues. We have worked on the issues fitting the log-link or log-binomial model as it is the one whose estimated coefficients provide direct estimates of relative risk. We continue to pursue extensions of this work, most recently log-link multinomial and ordinal models.

I also consult with Turner Osler, a trauma surgeon and Research Professor of Surgery in the University of Vermont’s Medical School. The research typically involves using logistic regression to model vital status at discharge of trauma patients in a variety of different settings and databases. Our current project involves looking to see if having insurance is associated with outcome.

5. Congratulations on the success of Applied Logistic Regression, now in its third edition which has been referred to in reviews by professional statisticians as ‘a classic’ that ‘remains an extremely valuable text for everyone working or teaching in fields like epidemiology.’ How did you first become aware of applied logistic regression and what lead you to make use of it within your work?

I first became aware of logistic regression in about 1975. When Stan Lemeshow joined the faculty at UMASS in 1976, we began to work jointly on the question of goodness of fit of fitted logistic regression models. The result was our 1980 paper that proposed the decile of risk test, now referred to as the Hosmer-Lemeshow test.

We began to include a bit about the logistic regression in our classes and that is when we became aware that there was really no text on the subject. We thought a book in the style of Sanford Weisberg’s Applied Linear Regression might be useful. We sent a proposal and sample chapter to the long time Wiley editor, Bea Schube. Stan and I met with Bea at the 1987 Joint Statistical Meetings in San Francisco and a contract for a book followed.

       David W. Hosmer


6. Did you anticipate the success the book would have? Had you seen that there was a gap in the market for this book?

What followed the publication of the first edition of Applied Logistic Regression was more or less pure serendipity. It was short book that readers found useful and for many years it was the only book on the subject. Its release also coincided with the addition of easily used logistic regression routines in the software packages of the day. A guiding principal for us was, and still is, that anything we wrote about had to be able to be done in currently available software. As a result much of the content dealt with interpretation and presentation of results of fitted models for a subject matter audience.

The release of the book also coincided with an explosion of the use of logistic regression into every field of inquiry imaginable. Again, something we had nothing to do with but all those users must have found the book helpful.

In a word we simply lucked out with the right product at the right time. In 1987 we had no inkling that Applied Logistic Regression would, now with over 30,000 citations, become the single most cited statistics book in print.

7. The third edition was released last year. For those who have not yet been introduced to the book who will read this, what can the reader expect in this version?

Like past editions, the book provides a focused discussion of what the logistic regression model is, how to build and evaluate sensible models that provide useful inferences from study data and how to explain those results to a subject matter audience. The first edition was 240 large print camera ready pages. The third edition of the book is 450 type set pages. The increase in length is almost completely due to extensions of the basic logistic regression model into different forms and data settings, as well as inclusion in statistical software packages routines to easily fit these models.

8. Who should read the book and why?

We wrote the book to be read at different levels: Individuals who are consumers, those using logistic regression to analyse their data, masters and doctoral students in biostatistics who are expected to provide expertise on the methods to consumers and individuals who do research in the field to find out what has been done and what some of the open questions are.

A guiding principal for us was, and still is, that anything we wrote about had to be able to be done in currently available software. As a result much of the content dealt with interpretation and presentation of results of fitted models for a subject matter audience.

9. Why is the book still of particular interest now?

Interest remains high because logistic regression is still widely used. It is hard to pick up a journal in any field where a binary outcome of interest is not analysed using logistic regression.

10. You have also authored another publication for Wiley, Applied Survival Analysis: Regression Modeling of Time to Event Data, now in its second edition. Is there a particular article or book that you are most proud of?

I just feel quite fortunate to have been able to work on both these books.

11. What will be your next book-length undertaking?

I retired from UMASS in 2002 and spent from 2004 to 2008 working on the second edition of Applied Survival Analysis and from 2009 to 2013 working on the third edition of Applied Logistic Regression. This past year it has been quite nice to get up in the morning and not have a book revision staring me in the face!

By design, Stan and I took on co-authors when we revised the two titles. We expect that Susanne May will take the lead on a third edition of Applied Survival Analysis and that Rod Sturdivant will do the same for a fourth edition of Applied Logistic Regression.

12. Do you have any advice for students considering a university degree in statistics?

Go for it, there are lots of really interesting and well paying jobs out there. Along the way learn how to express your statistical work clearly in both the spoken and written word.

13. Do you continue to get research ideas from statistics and incorporate your ideas into your teaching? Where do you get inspiration for your research projects and books?

Most of the ideas come from some unmet statistical need arising from a data analysis project. For example, in analysing hospital discharge data of trauma patients to see if having insurance was associated with outcome, Turner and I found out that when a patient is admitted to a trauma centre without health insurance, there is a team of individuals that scurries about trying to obtain insurance. Some patients transition from no insurance to insurance during their stay while others do not. Unfortunately insurance status is recorded at discharge. Thus we do not have the actual time varying covariate. This combined with the fact that mortality is highest in the first two days leads to an incorrect conclusion that having insurance improves the odds of survival. The approach we used, one that is ripe for research, was to “impute” insurance status for all patients with a length of stay that was less than 1 day. The implausible protective effect of insurance went away.

14. What has been the most exciting development that you have worked on in statistics during your career?

Working with Stan and now Susanne and Rod on the books has been immensely satisfying to me.

15. What do you think the most important recent developments in the field have been and will be in the future?

Good question and I really cannot give you a good answer, as I just do not pay as close attention to the field as I once used to. Instead of perusing the journals I am spending that time on community service activities. Also, staying fit enough to keep up with my former Olympian wife is a continual challenge!

16. Are there people or events that have been influential in your career?

I was the beneficiary of an excellent mathematics education at the University of Vermont and in biostatistics at the University of Washington. I am deeply indebted to the faculty I had in those two Departments. Also, Stan Lemeshow has been a great friend and colleague. We’ve been a good team and the proof is in the pudding.


Copyright: Photograph appears courtesy of Dr Hosmer