“One of the most enjoyable aspects about being a statistician is that data comes from different fields and each one poses a novel scenario”: An interview with Ingram Olkin

Ingram Olkin is a professor emeritus and chair of statistics and education at Stanford University and the Stanford Graduate School of Education. He is best known for developing statistical analysis for evaluating policies, particularly in education, and for his contributions to meta-analysis, statistics education, multivariate analysis, and majorization theory.

Olkin received a B.S. in mathematics at the City College of New York, an M.A. from Columbia University, and his Ph.D. from the University of North Carolina. Olkin also studied with Harold Hotelling. Olkin’s advisor was S. N. Roy and his Ph.D. thesis was “On distribution problems in multivariate analysis”. Dr. Olkin is a Guggenheim, Fulbright, and Lady Davis Fellow, with an honorary doctorate from De Montfort University.

Olkin was awarded the first Elizabeth Scott Award from the American Statistical Association for his achievements in supporting women in statistics, is a former President of the Institute of Mathematical Statistics, and has written numerous articles and books.

Jon Gurstelle, Executive Editor of Statistics at Wiley, talks to Professor Olkin about his life and prestigious career.

 

1. Your research interests include the development of powerful new statistical methods for combining results from independent studies that have analysed the same topic. Other research interests include analysis of social and behavioural models, and correlational and regression models in educational processes. What are you working on currently?

I’m working on several different projects at the moment. One is called inter-laboratory comparisons. It turns out that even if you send similar samples to several laboratories, the measurements that come back are often different. So measuring the amount of bacteria in some plant sent to different laboratories might result in heterogeneous results. This point was noticed 50-60 years ago when some scientists at the National Bureau of Standards sent out samples in order to set standards, but the results they received were variable. There are many papers on this problem which is now called interlaboratory differences. I have written two papers on this topic but I now have some new thoughts and am in the process of bringing this all together. This is an interesting problem to work on as it also involves meta-analysis in terms of the results of independent laboratories.

I’m also working with Betsy Becker, who is at Florida State University. She is my academic grandchild, and we have collaborated on correlational models. It’s really quite amazing about correlations because the idea of two related variables arose in the late 1800s, with Sir Francis Galton and with Karl Pearson coming on later, so you would think that we know everything we need to know about correlations–but that isn’t quite true. There are now new fresh questions that keep arising, part of which is due to the fact that we have more data.

Finally, I’m working with a group of radiologists at Mount-Sinai Hospital on lung cancer screening; they have a large repository of data. One question is how to predict how quickly a nodule in stage one lung cancer will grow because if it grows quickly, you may wish to increase the dose. Predicting its growth at the end of the year is one of the characteristics that physicians note.

2. You have received numerous awards from the first Elizabeth Scott Award from the ASA for your achievements in supporting women in statistics to the Marvin Zelen Leadership Award. Is there a particular award that you were most proud of receiving?

I was very pleased to receive the Wilks Medal in part because Sam Wilks was such a leader in the profession. He, together with John Tukey at Princeton, left a legacy of accomplished students so it was very pleasing for me to be recognised as someone following his beliefs and leadership.

3. You have authored many publications. What are the articles or books that you are most proud of?

It’s hard for me to distinguish one from another, but whenever I worked on a book-writing project it was a full-time enterprise and that’s all I thought about. The books on meta-analysis, inequalities and life distributions and more recently, the edited book on leadership of women in statistics consumed all of my time, but the result was very pleasing.

There was one paper that had a different trajectory. In 1967 I wrote a paper with Albert Marshall, a colleague with whom I collaborated for many years, on a bivariate exponential distribution. The exponential distribution is very fundamental in engineering reliability and survival analysis and it has very singular properties. One might say that for positive measurements, the exponential distribution plays a role similar to the normal distribution, which takes on both negative and positive values.

In any case, we wrote this paper which we like but did not think it was particularly exceptional. It turned out, historically, that it became the starting point for many other problems and this distribution is now referred to as the Marshall-Olkin distribution and has many citations, so it is great that it captured the imagination of a lot of researchers.

4. You were Editor of the Annals of Mathematical Statistics and served as the first editor of the Annals of Statistics, both published by the IMS. You also played a major role in launching the Journal of Educational Statistics, which is published with the ASA, as well as a Distinguished Editor for Linear Algebra and its Applications. What do you see as the role of journals in statistics?

I think journals are fundamental for every field so I am very positive about them. However, there are always issues with publications. In the early days, journals were published by societies such as the American Statistical Association (ASA) publishing the Journal of the American Statistical Association (JASA). By and large, these were general journals in that they covered all research areas in statistics. However, at a certain point in history, the commercial publishers began to publish special topics journals such as the Journal of Multivariate Analysis or the Journal of Time Series. What was striking was the difference in pricing between society and commercially published journals. The IEEE solved this problem by publishing many subject journals, such as the IEEE Transactions in Reliability. I suggested to the ASA Board that they follow a similar road and launch journals in applied statistics and other areas. But this was not accepted so they never splintered their journals and JASA remains the main journal.

In 1970, I was Editor of the Annals of Mathematical Statistics and we published over 2000 pages. The result was that the Editors did not really have time to play a role in the editorial process. It became clear that different specialized areas were beginning to blossom. The increase in submissions from 1930-1970 indicated that this growth would continue. I suggested that the journal split into two publications, one in statistics and one in probability. This proposal was met with understandable scepticism. Some readers suggested that we keep one journal representing a unity of areas in statistics. Others agreed that it was time to separate the journals into these two entities, and that each would probably continue to grow. In 1972, we did split the journal into The Annals of Statistics and The Annals of Probability. I transitioned to Editor of Ann. Statist. and Ron Pyke was named Editor of Ann. Probab.

In retrospect, I did not fully realize the exceptional growth that would take place. We now have the Annals of Applied Probability and the Annals of Applied Statistics. More young people have been coming into these fields so journals are diversifying. These publications are really important. I am personally sorry that everything is online as I myself used to enjoy just browsing through journals, but now I find it difficult to browse online. I found it easier to browse through tables of contents in a library!

Morrie de Groot and I proposed publishing a very general journal on statistical science and this too was met with scepticism because most of the researchers in academic statistics departments are really very theoretical, so a general-interest journal was rather alien to them. The statistical community is very good at writing for the academic community but not so good at writing for the general public. However, that journal, named Statistical Science, was published and is now one of the big successes in the statistical community. One of its popular features is a series of interviews with statistics luminaries.

There is no journal that combines the fields of statistics and education, so Mel Novick and I suggested that there be a liaison between the ASA and the American Educational Research Association to publish a journal covering the marriage between statistics and education. This journal was first called the Journal of Educational Statistics and then later the Journal of Educational and Behavioral Science.

Finally, matrix theory was an area that was more or less ignored in mathematics and those working on it were experiencing a lot of difficulty getting their papers published. A group of scientists from Wisconsin including Hans Schneider and Ralph Brualdi started a journal in linear algebra. I became the connection between linear statistics and algebra and later on, we brought in more statisticians and this journal, Linear Algebra and its Applications, now has a very long publication record.

5. What have been the most important books published in statistics that you have ever read?

A number of books do come to mind. I think Will Feller’s An Introduction to Probability Theory and its Applications is a classic and a gem and really had an effect on the publication of books. When I was a student in the late 1940s, there were practically no books at all. Harald Cramér’s book on mathematical statistics, despite being written in 1945, is still a gem. It is not too elementary and not too advanced but it covers so much in the field that is needed for basic theoretical work. The book by Ted Anderson on multivariate analysis also provides the foundations of the field and is very well written. It has had a profound effect on many researchers. Paul R. Halmos wrote a fun-to-read book called Finite Dimensional Vector Spaces which was required if you wished to learn linear algebra; he was a great author and raconteur and it became a great book to just browse through.

6. What has been the most exciting development that you have worked on in statistics during your career?

One of the topics that has really been gratifying is in the extension of univariate characteristics or distributions to the bivariate case. It’s now clear that there are a lot of measurements that are correlated and the development of bivariate distributions took a long time. So this brings up correlations between two measurements and that doesn’t exist in the univariate case.

As I mentioned, one such paper in 1967 is on the extension from one to two variables, each having an exponential distribution. What we found is that the model includes the case in which both components fail. It’s not just that the husband dies and the wife dies, they die simultaneously. That used to be thought of as a rare event. So when we wrote this paper, it caused some discomfort in the readership because they didn’t like the idea of a simultaneous failure. Then there was an airplane crash in Iowa in which both engines failed simultaneously. Once this had happened, our distribution started to have wide applicability. So it now has many citations and is the beginning of many theoretical advances.

7. What is it that you unfailingly love about working in the field of statistics? What drives you?

One of the most enjoyable aspects about being a statistician is that data comes from different fields and each one poses a novel scenario, so there is always something unique that you have to look at. A study on jury choices is very different from the analysis of a randomized clinical trial. That, in turn, is also very different from tests and measurement. One learns from collaborators in different fields. The problems posed have a unique component and this unique component requires new research and suggests new problems to solve. I think this whole experience is a really exciting kind of work.

8. What do you think the most important recent developments in the field have been? What do you think will be the most exciting and productive areas of research in statistics during the next few years?

Historically, statistics was developed with small sample sizes in mind. However, we have now entered a new era sometimes called Big Data, Data Science, Analytics, and so on. Essentially, these are all statistical analyses of large data sets. An analysis of such data requires knowledge of computer competencies, plus statistics. Statistics starts with trying to create models that underlie the data. Computer scientists don’t think this way so there is still a gap between how computer science and statistics pursue an analysis of data. This gap needs to disappear. When it does, we will probably see many new developments. For me, I can only hope that the next generation will develop new theoretical foundations for data science, which I don’t believe exist at the present.

9. Do you think over the years too much research has focussed on less important areas of statistics? Should the gap between research and applications get reduced? How so and by whom?

I do think that a lot of the research that has been published has focussed on detailed theoretical analyses. I don’t want to call this research less important. Every field needs to have a basic encyclopaedia of facts and the field will be fine as long as there is a group of leaders to move it forward. I do hope that more statisticians become involved with applications. Unfortunately substantive fields are wary of hiring statisticians. A substantive department of size 20 should include at least one statistician, and one of size 40 should include four or five. These statisticians can help improve the analysis of data arising in the applied department. In particular, the statistician can help graduate students use good methods of data analysis.

10. What do you see as the greatest challenges facing the profession of statistics in the coming years?

My view is that statistics can contribute in two ways. One is the development of the core of statistical methods. This was the case during the period of 1930-1980 when many of the statistical foundations were developed. This development is continuing but seems to be more fragmented. The second contribution lies in helping develop the foundations in applied fields. We now see an influx of researchers in what is called big data. Data science is attracting computer scientists and in most cases they are not trained in statistics. So there is a fundamentally different approach as to how computer scientists and statisticians work on the analysis of data. In many instances, there is no model or attempt to understand the data but rather simulations are used to arrive at conclusions. Simulations are very important but they don’t get to the core of the model. In particular, they do not provide causal explanations.

11. Are there people or events that have been influential in your career? Also, given that you are one of the most well respected statisticians of your generation and many statisticians look up to you, whose work do you admire (it can be someone working now, or someone whose work you admired greatly earlier on in your career?).

Harold Hotelling was a giant in both statistics and economics and he was very influential in my career. In particular, as a student, I gave a lecture to both students and faculty and Hotelling suggested after my talk that the material be published. I was a brand new graduate student and unaware of what publication meant. The paper extended some work that Professor Pao-Lu Hsu had developed. Hsu was on the faculty of the University of North Carolina but was in China at that time. Hotelling said that not only should we publish it, but he was instrumental in helping us. As Hsu was away and the work related to him, Hotelling managed to contact him in China. This was not easy in those days because the US and China did not have diplomatic relations. Hotelling wrote to a colleague in England asking to forward on a letter to Hsu who followed the path in reverse. That paper was published when I was just graduating and it had an important effect on my career. It also sent me in the right direction. Later on, the book and other works of Ted Anderson were particularly influential. Ted was also a close colleague and a collaborator.

12. If you had not got involved in the field of statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)

This is a hard question. I think that law is an area I might have enjoyed. Part of the reason is that every law case brings with it a unique aspect which is similar to what applied statisticians face. One law case might relate to medicine, another might relate to an airplane crash, and still another to gender discrimination. All of these are challenging in the same way that statisticians look at applied fields. On the other hand, I might rebel at the pressure that every attorney faces with every court case!

 

Copyright: Image of Professor Olkin appears courtesy of Stanford University