“You need to promote a culture in which statisticians and subject matter scientists are working collaboratively hand-in-hand”: An interview with Royal Statistical Society President Peter Diggle

Professor Peter Diggle is currently the President of the Royal Statistical Society, who began his term early in 2014 upon the resignation of John Pullinger who was appointed our National Statistician. Professor Diggle is Distinguished University Professor at Lancaster Medical School and also holds a part-time post at the University of Liverpool Department of Epidemiology and Population Health and adjunct appointments at the Johns Hopkins University School of Public Health, Columbia University International Research Institute for Climate and Society, and Yale University School of Public Health.

He is a trustee for the Biometrika Trust, a member of the Advisory Board for the journal Biostatistics and chair of the Medical Research Council’s Strategic Skills Fellowship Panel. In 1997 he was awarded the Guy medal in Silver by the Royal Statistical Society and  he was an EPSRC Senior Research Fellow for four years.

Previously, he has held positions at Newcastle University, and the Commonwealth Scientific and Industrial Research Organisation in Australia.

Statistics Views talks to Professor Diggle about his career, his research interests, his Presidency of the Royal Statistical Society and an exciting new project he is working on with the African Institute of Mathematical Sciences.

1. You graduated from the University of Liverpool in 1972 with a First Class Honours degree in Computational and Statistical Science, and gained your PhD from the University of Newcastle-upon-Tyne in 1977. When and how did you first become aware of statistics as a discipline and what was it that inspired you to pursue a career in the field?

My very first discovery of statistics came in finding a paperback on my father’s bookshelves called “Facts from Figures” by M.J. Moroney, which was published around 1950 and is still available. Although it would be seen now as a very old fashioned pre-computer age discussion of the subject, it was very readable and I would still recommend it. At that time, I was considering what to study at university. I started an undergraduate degree in mathematics and it was really only after I switched from Edinburgh to Liverpool midway through my degree that I was completely converted to statistics because I encountered an inspirational tutor, the late Julian Bezag. He taught me statistics and stochastic processes and encouraged me to go on and do a PhD. He was briefly my PhD supervisor and we collaborated at various times over the years, so he was really the one who got me on the road to being an academic statistician.

2. Your research concerns the development and application of statistical methods relevant to the biomedical and health sciences. What are your main objectives and what do you hope to achieve through your research?

It’s been a kind of interesting and almost happenstance journey for me really. When I first started my statistical research, it was really in the area of stochastic processes, which looking back, now seems very esoteric to me. It was very theoretical, and gradually over the course of my career, I became more interested first in statistical methodology and then in a range of applications. Another of many happy accidents in my career was when my boss at Newcastle-upon-Tyne, Robin Plackett, promised me a sabbatical. I considered where to go and at the time I was aware of working spatial statistics by a Swedish statistician named Bertil Matern who I discovered worked in the Swedish College of Forestry. So I went to Sweden to work with Bertil during my sabbatical. He got me very involved in some applied forestry projects which I enjoyed, so the next stage in my career was to go to CSIRO in Australia where, at the time, statisticians worked hand in hand with subject matter scientists. I eventually became interested in public health through a visit to Johns Hopkins. That wasn’t the reason I went to Johns Hopkins. I had met Scott Zeger by chance when we were both visiting London, we had started collaborating and he invited me to visit him. He worked in biostatistics in the Hopkins School of Public Health which I thought was a great environment. I tried to emulate that when I came back to the UK and set up medical statistics at Lancaster.

Ever since then, I have gravitated more towards working in health generally and if I think what my objectives have been in the past two decades, I would say that there is an overriding objective – what I try to do in my own research and in my teaching is to promote the view that you can actually think of statistics, and in particular, statistical methods as an integral part of scientific method. You need to promote a culture in which statisticians and subject matter scientists are working collaboratively hand in hand rather than statistics being a kind of add-on where, at one extreme, it is just number crunching. Instead the idea is to strengthen the collaborative interface and try to add value to subject matter research by statistical thinking and insight with the development of new methods following from the needs of the science.

If I was trying to characterise what the best statistical research is, it is where you develop methodology that starts off by being motivated by a real problem and then gets fed back into that real problem so that it’s a positive cycle all around. The science makes you think about new statistical tools that are fed back in to improve the science and it becomes a virtuous circle of collaboration. I think that’s really where the interesting activity is in modern statistics – the interface between theory and methods on the one hand, and applications on the other. If you’re purely application driven, you can lose your cutting edge skills and if you are purely theoretical, you are kind of missing the point!

For me one of the most positive changes in the world of statistical research that I’ve lived through, particularly in the UK, is that there is now a much less clear-cut distinction between so called applied statisticians and so called theoretical statisticians. There have always been many honourable exceptions but when I started my career, you would tend to find that the people who were advancing our discipline would be in maths departments and the people in medical schools were primarily responding to medics’ needs. Now there is much more activity at the interface, because there is a recognition that advances in medical and health sciences on the one hand, and in statistical and mathematical methods on the other, can feed off each other to mutual benefit.

If I was trying to characterise what the best statistical research is, it is where you develop methodology that starts off by being motivated by a real problem and then gets fed back into that real problem so that it’s a positive cycle all around. The science makes you think about new statistical tools that are fed back in to improve the science and it becomes a virtuous circle of collaboration. I think that’s really where the interesting activity is in modern statistics – the interface between theory and methods on the one hand, and applications on the other.

3. Is there a particular area where you feel that the application of statistical methods has made a significant impact or contribution to progress in medical research?

I work in health and medicine because I have a passion for it and it drives me. There is a real sense for me that I would be working in the health service if I wasn’t a statistician. There are certain areas of application I am happy for others to work on as those areas bore me rigid – I couldn’t get interested in finance for example, however a lot of the statistical models used in finance are very similar to those used in health! It’s also the case, though, that statistics has perhaps been recognised more in medicine than in most other fields, as being absolutely vital in obtaining good robust findings and evaluating policy changes. Going from the first clinical trials in the 1940s right through to modern day debates about policies in the NHS that do and don’t work – statistics is just fundamental. But there are increasingly many areas where the importance of statistical method is recognised. Applying statistical method in the social sciences is just as challenging as in the natural sciences – arguably more so, because it’s harder to run controlled experiments. The embracement of the quantitative method as a powerful, universally applicable methodology has driven people like me to do what we want to do.

4. You are currently Distinguished University Professor at Lancaster Medical School, with adjunct appointments at the Johns Hopkins University School of Public Health, Columbia University International Research Institute for Climate and Society, and Yale University School of Public Health. You have developed and taught a wide range of courses in a university context. From your extensive experience of academic teaching and research, what advice would you give to students considering a university degree in statistics, or to graduates looking to develop a career in this field?

If they are undergraduates, I would advise them not to pursue an specialist undergraduate degree in statistics because I think that if you want to become a really first rate statistician, you require three things – 1) get a really good grounding in mathematics; 2) choose a substantive area that you really care about and understand how statistics can have an impact on that particular area; 3) acquire good computational skills.

Statistics is best taught to relatively mature students. Masters level is a great time to teach statistics. What I would like to see in undergraduate mathematics courses is more probability and less statistics. Probability is the foundation of both statistical modelling and statistical inference, and is also mathematically interesting. This is even truer of secondary education, at least in the UK but quite possibly elsewhere, where the kind of statistics that is currently taught within maths curriculum all too often reduces to statistical arithmetic with no insight. If you are dealing with a science that you care about and where data and the interpretation of that data is important to the science, then the motivation to learn statistics comes more naturally. I am on record in my President’s Address as saying this but not everyone agrees with me! Things do change, and in earlier days there were fewer opportunities to study statistics. I did come to statistics early, and I think my mathematical ability has suffered as a result. There are many areas in mathematics where I am weak. I appreciate that you have to be pragmatic about this so if someone aged 18 wants to study statistics, it is of course not a good idea to say that they cannot. But I do think that if that person could be inspired to get to statistics through studying probabilistic mathematics and a science that has data in it, they would be better prepared to become a statistician. You can’t and shouldn’t force anyone to go that way but to my mind, that is the best route. I’ve also talked about how we teach statistics to biologists because we expect biologists to learn some statistics, but we rarely expect biostatisticians to know any biology!

5. Over the years, how has your teaching, consulting, and research motivated and influenced each other? Do you continue to get research ideas from mathematics and statistics and incorporate your ideas into your teaching?

In recent years, most of my teaching has actually been for non-statisticians. What I certainly enjoy doing is teaching statistics to people who are scientifically mature but statistically naïve. I try to get them away from the way most scientists used to be taught statistics as in ‘Here’s a set of tests and estimators and you can apply them to different situations. You’ll get an answer and it might not be what you were expecting but you’d better learn how that answer came about.’ Instead, what I try to teach is statistics as if you’re having a discussion with a scientist – starting with design and then going through the analysis and finishing off with the interpretation, bearing in mind that any statistical analysis is almost always answering someone else’s question. In that sense, I can use my current research projects as motivating examples and describe them in a way that makes it unnecessary to go into the details of the original statistics and the underlying mathematical theory. If you are addressing a question, you should be able to answer it in ways that people understand even if they don’t understand the technology that was used to obtain the answer. I sometimes draw an analogy with a doctor explaining their choice of a patient’s treatment without needing to go into all the biochemistry of exactly how the drugs work.

I don’t think with modern science you can teach people to be really expert at statistical theory and methods and applied statistics and science – it’s just too much. But you can build teams of people who have strengths in each of those areas and as long as they have enough knowledge of the intersections to be able to talk together, understand each other and respect what each other’s contribution is, then they can do good science. I try to teach my non-statistics students to remain non-statisticians but appreciate the value of statistics, and conversely I try to teach my PhD students that what they are doing has to have a scientific purpose – otherwise it’s not statistics, it’s statistical mathematics. You have to understand what is really driving you. We need cutting edge mathematics and cutting edge science and we need statistics to bridge the two but we do also need to advance the theoretical research that lies underneath the applied methodology, otherwise we will be out of date in 10-20 years’ time.

I did come to statistics early, and I think my mathematical ability has suffered as a result. There are many areas in mathematics where I am weak. I appreciate that you have to be pragmatic about this so if someone aged 18 wants to study statistics, it is of course not a good idea to say that they cannot. But I do think that if that person could be inspired to get to statistics through studying probabilistic mathematics and a science that has data in it, they would be better prepared to become a statistician.

6. Over the years, how did the teaching of statistics evolve and adapt to meet the changing needs of students?

The obvious change is the computer, but I don’t think we’ve embraced that fully yet. I don’t think many statisticians are taught to be competent programmers. I think they largely lean towards packages. I think that needs to change, but certainly the computer revolution has allowed us to teach statistics in a way that does more directly intersect with other disciplines. Also, of course, it is a two way street and all of the computational developments have then opened up the possibility of producing more efficient algorithms because when you have a computer, you suddenly realise that the problems that were impossible are now possible and in turn, you then very quickly think of some new problems that are still impossible. The answer isn’t always a bigger and faster computer, it is a smarter algorithm, and some of those algorithms develop really elegant probability theory that allows you to understand how reliably (or not!) they converge. You do get this wonderful mutually reinforcing bounce between theory and practice. The practical example stimulates new theory, the theory is fed back into the application, and it moves on and then you need new theory. It’s very much this back and forth cooperation which shows why we need both statistical mathematicians and statistical scientists. If we lose either of those, we’ll become obsolete, but as long as we have them both, we should continue to flourish.

7. You have authored many publications. Is there a particular article or book that you are most proud of?

I wrote a paper that was published in 1986 in a journal called The Journal of Neuroscience Methods. I really like that paper because it started from a very basic question in developmental biology that was extraordinarily easily expressed, and it was this – if you look at a particular kind of cell in a mammalian retina, there is a particular sort of cell that sits in two layers. They are physiologically similar but they have different functions – one of them basically tells the brain when a light goes on and the others when a light goes off. The developmental question was ‘Are these cells initially formed in a single layer and later differentiate to perform their separate functions, or are they evolved in initially separate layers that then fuse?’ It turned out that I could answer that question really convincingly by using, on the face of it, something relatively esoteric about spatial point processes. I couldn’t look up a solution in any text-book, but having studied spatial point processes, initially for their own sake rather than to understand anything about developmental biology; it provided a very neat answer to the question. It was a very simple paper, but for me it was extraordinarily satisfying as it answered the question in a way that could be understood by anybody and yet at the same time, a classically trained statistician circa 1960 wouldn’t have had the tools to solve it.

RSS Ordinary Meeting papers (“read papers”) are feathers in any statistician’s cap, so I’m proud of all my RSS read papers. If I had to pick a favourite, I think it would be the model-based geostatistics paper written with Rana Moyeed and Jon Tawn, both Lancaster colleagues at the time. That paper brought geostatistics, which was originally invented and developed as a more or less self-contained methodology for solving prediction problems in mining, into the statistical mainstream. It has in turn contributed to geostatistics being adopted and applied in other areas, particularly in the health sciences in developing country settings where, as in its original context, the data available to understand spatially varying phenomena, e.g. disease prevalence, tend to be rather sparse and you need to build statistical models to exploit the data to best effect.

8. You have been a fellow of the Royal Statistical Society since 1974, and have made an active contribution to the Society in many ways, culminating in your current role as President. What are your thoughts on the role so far? At the time of your nomination, you said that you hoped to contribute to the Society’s support for early-career statisticians and statisticians working in developing countries; what progress has been made on this during your tenure so far?

On the early career side, that turned out to be a very easy job. We have a wonderful section of the Society called the Young Statisticians Section. Traditionally, sections had substantive themes – medical statistics, computational statistics, etc. The Young Statisticians Section has a different model – they are interested in everything and they are just wonderfully vigorous. I take every opportunity to support them whenever I can. They are dedicated and full of ideas.

On the developmental side, I am particularly pleased that we now have a formal partnership with the African Institute for Mathematical Sciences (AIMS), which originally was inspired by Neil Turok, formerly at Cambridge but now Director of the Perimeter Institute for Theoretical Physics in Waterloo, Ontario. In the early days, AIMS focussed on mathematics. It runs an MSc program on an interesting model where the students come in from across Africa but the lecturers come for anywhere in the world to deliver a 3 week lecture course and then return home. This is a much more efficient way of building capacity in relatively low resourced countries than the traditional model of finding your best African student, getting them a Commonwealth scholarship and sending them to the UK. By the time I discovered the existence of this model, two important things had happened – AIMS had opened campuses in several other African countries – Cameroon, Ghana, Tanzania and Sierra Leone – but more crucially, the Tanzania campus was given a brief to focus, not exclusively but to a substantial extent, on statistics, particularly at the interface with health and agriculture, which are the primary sources of industry in Tanzania. I wanted to get involved and talked to statisticians who had already been out there, such as David Spiegelhalter and Jane Hutton.

In a couple of weeks’ time, I am going out to Tanzania with two colleagues, Emanuele Giorgi and Michelle Stanton, to teach spatial statistics for epidemiology and public health to about thirty students. We can do that for the cost of the airfares, whereas to send any one of those thirty students to Lancaster would cost thousands of pounds. So I think it’s a wonderful model, and I’ll have a better idea when I’ve been out there and experienced it first-hand how well it works.

With lecturers coming and going, the curriculum has to be built pragmatically. I thought the RSS could come in here and be a broker in getting young statisticians interested in contributing to this programme. So we now have a formal partnership in which the RSS undertakes to fund three people a year to go out and teach. AIMS will tell us what they want and we’ll use our networks to identify suitable lecturers – my own view is that we should probably focus on teaching probability and likelihood-based inference as a core on which later statistics modules, such as the one I’ll be teaching, can build. Jane Hutton is coordinating the operation of the partnership and she is well aware that her best ally will be the Young Statisticians section.

The first year RSS will be sending out lecturers will be the next academic year starting September 2016. It might take a year or two to get going but it should really make a difference and it’s not entirely a one way street. In other trips to Africa, I have met students who have not been very well trained but who are just as able as our own students. I now have four African PhD students in my own group at Lancaster who have seized on opportunities to get the PhD training that they can’t get in Africa. But what I have tried to do is to get them working on projects that are relevant in Africa so that there is a reasonable chance they will return to their countries and develop their careers there. AIMS at the moment specialises in MSc level and does not currently have the capacity for PhD supervision, but it’s an ambition to develop that capacity and if we can, then I’d like to see African students co-supervised by an AIMS and a non-AIMS supervisor using Skype to keep in touch. I can see this working because two of my four African students spend half their time in Malawi. When they are there, I talk to them via Skype and they do field work and find out what real data is like, then when they’re in Lancaster, their epidemiology supervisor in Malawi talks to them via Skype and they do theory, methods and algorithms. I think it is a model that could work and I am committed to it.

I used to get on the morning train from Newcastle, go to the RSS meeting, then to a pub in Tottenham Court Road, which was where I first met Bernard Silverman, Frank Kelly, Peter Green and many others, then back to Newcastle on the overnight sleeper. You don’t need to do that anymore to make friends and keep in touch, which I think is a bit sad. But it’s why I find the Young Statisticians such a tonic, because they’ve got their own ways of networking that add value…but they’re also not averse to meeting in the pub round the corner form RSS HQ. Altogether, it’s a delight to have the honour of being President, but if there was one particular highlight so far, it would be getting the partnership with AIMS, which should last well beyond my presidency.

9. What have been the highlights of being President of the Royal Statistical Society so far?

It’s just a constant delight, because the Society has been ever present in my career. When I was in my early years as an academic, we used to go to the RSS meetings the way my parents used to go to the cinema. You didn’t bother to find out what was showing, you just went! I used to get on the morning train from Newcastle, go to the meeting, then to a pub in Tottenham Court Road, which was where I first met Bernard Silverman, Frank Kelly, Peter Green and many others, then back to Newcastle on the overnight sleeper. You don’t need to do that anymore to make friends and keep in touch, which I think is a bit sad. But it’s why I find the Young Statisticians such a tonic, because they’ve got their own ways of networking that add value…but they’re also not averse to meeting in the pub round the corner form RSS HQ.

Altogether, it’s a delight to have the honour of being President, but if there was one particular highlight so far, it would be getting the partnership with AIMS, which should last well beyond my presidency.

10. What do you see as the greatest challenges facing the profession of statistics in the coming years?

Data science. An unsurprising answer, but what we need to do is to see data science as a fantastic opportunity to make statistics more popular, relevant and widely understood than ever before. But we have to do it in a positive way. If we just moan at computer scientists who think they can do it all, they will do it all and they won’t want to work with us, whereas if we can actually embrace the challenge of data science, then we can be integral to it. My model for data science is a triangle – statistics, computer science and science – and if you can keep that triangle going, then all three partners have got a lot to offer. If it is purely computer science driven, then science takes a back seat and becomes data technology. Equally if it is purely science driven, it doesn’t really take advantage of all the opportunities there are for modern ways of capturing and analysing data. We need to train our statisticians the difference between being able to bodge a piece of code together in order to get an answer for yourself and being able to write professional quality code that others can use.

11. What has been the best book on statistics that you have ever read?

My nomination is Bertil Matern’s PhD thesis, published in Sweden in 1960 as an internal publication of the Royal Swedish College of Forestry. When I was a research student in the early 1970s, my supervisor was giving me things to read and I was reading papers from the RSS, Biometrika,…. I found this long paper that was being circulated privately, called Spatial Variation by Matern. I didn’t know how it had begun to be circulated, but it was a treasure trove. It actually foreshadowed all sorts of things that became mainstream in spatial statistics in the 70s and 80s – many of the point process models that Frank Kelly and Brian Ripley introduced were born from the elements in Matern’s thesis; the elements of random set theory that David Kendall later developed were also in Matern’s thesis; the variogram from geostatistics is there. It is an extraordinary work. Each topic was treated briefly, but the essential ideas were there. I found out later that someone who met Bertil in Sweden invited him to give some lectures at Imperial College and when people asked him where they could read about his themes, he replied that they would have to get a copy of his thesis because he didn’t really publish any journal articles. I still have my copy. It’s very old fashioned in style but absolutely extraordinary in the number of things it foreshadowed.

12. Who are the people or events who have been influential in your career?

First and foremost, Julian Bezag. Difficult to deal with on a daily basis but inspirational and extraordinarily kind beneath his sometimes irascible exterior. After being taught by him, I asked where I could learn more and he advised that I go to Oxford and work with Maurice Bartlett. So I did. But Bartlett then announced during my first year that he was going to Australia, so I wrote to Julian and asked if I could carry on my PhD with him. Julian said “yes, but I’ve been asked to apply for a senior job at Newcastle who are also looking for a junior, so why didn’t we both apply?” So I did apply, and then as I was about to start my second year with Julian I asked him about Newcastle and he hadn’t applied! Several weeks later, I got an interview and got the junior job, so I went to Newcastle and Julian stayed in Liverpool. Robin Plackett was the head of department at Newcastle and I couldn’t have wished for a better one, he protected me from all the rubbish and encouraged me in what I wanted to do. He said I needed to finish my PhD, but that as soon as I had, I could have a sabbatical. So although I no longer had any formal supervision, Robin was a wonderful mentor. I would check new topics with him and he would advise on what to study and what to ignore. More recently, Scott Zeger has been a great influence, because it was through meeting Scott and our becoming close friends that I got into medical statistics, which is what I’ve been doing for my whole career since.

13. If you had not got involved in the field of statistics, what do you think you would have done? (Is there another field that you could have seen yourself making an impact on?)

I doubt if I could have made an impact but if I had been educated in England I would have studied history at university. I was educated in Scotland where there is a broader curriculum than in the English A level system. In Scotland I studied maths, physics, chemistry, English, history and French for university entrance level. If I had needed to specialise at school, I would have chosen English and history. No idea for a career after that!

 

Copyright: Image of Professor Diggle appears courtesy of the Royal Statistical Society