Does evidence matter?


  • Author: Dr Jennifer Rogers
  • Date: 02 Jun 2014
  • Copyright: Image appears courtesy of iStock Photo

Each year the Royal Statistical Society appoints a lecturer, known as the Guy Lecturer, to prepare a lecture aimed at sixth form and GCSE students that draws out the importance and widespread applicability of statistics in a serious but accessible and entertaining way.  This year the RSS appointed Dr Jennifer Rogers, a lecturer at the London School of Hygiene and Tropical medicine, who has been visiting schools across the country talking about evidence based medicine in modern healthcare.

At the start of January I was extremely honoured to be appointed the Royal Statistical Society Guy Lecturer. I was asked to write a presentation suitable for GCSE and sixth form students that would talk about statistics in an accessible and entertaining way. No easy task! I decided to talk about what I know best, clinical trials, and came up with the title “Making life saving decisions in clinical trials: how much evidence do we need?” I wanted to give the students an insight into the world of clinical trials and hopefully leave them with an appreciation of all the questions that need to be answered as part of the treatment development process. But what questions do need to be answered when considering a new therapy? There is a natural tendency to think that new means better and equally that because something has been around for a long time it must be safe.[1] Neither of these things is necessarily true and so evidence based decision making is critical to the continuing development of modern medicine. Three of the big questions that we need to consider when considering a new therapy are: Does this treatment work; how well does it work; and are there any side effects.  It is these three questions that we shall focus on in this article.

thumbnail image: Does evidence matter?

We are lucky to live in an age where therapies for a whole range of diseases are well established and we do a very good job at treating them.  What this does mean though, is that when we are testing a new experimental therapy against an existing one, we should not expect to see very dramatic differences in their effects, which would usually be recognised very quickly.  We should instead expect to see quite small differences which are not so obvious to observe.  Uncertainty is also an intrinsic and normal part of scientific research, which is not to say that our findings are unreliable.  If we were to consider answering the question: what is the average height of Londoners? It would be unfeasible to measure the height of every single person in London and so we must take a sample of people and take their average height as a way to infer what the average height of all Londoners might be.  Of course it follows that had we taken a different sample of people, we would have observed a different estimate of average height, and another sample would be different again.  This demonstrates the uncertainty in our estimate obtained from the sample that we chose.  The size of the sample that we analysed and how representative it was are both likely to impact on how different our estimate would be from the others and so one way to try and reduce the amount of uncertainty in our estimates is to increase the sample size.  To illustrate this problem of accurately estimating treatment effects further, let us consider the rolling of a dice.  Now, a dice has 6 sides and so a fair dice should have a probability of rolling any of the numbers 1, 2, 3, 4, 5 or 6 of 1/6 and so if we were to repeatedly roll a dice, we would expect to see about the same number of each outcome.  But if our dice was biased, we would expect to see a deviation from equal probabilities of 1/6 and we might see more or less of particular numbers compared to others.  Running simulations under three different dice scenarios, Table 1 gives us what the spread of outcomes looks like for each dice after 20 rolls, and Table 2 shows us what the distributions of outcomes look like after 100 rolls.

1 2 3 4 5 6
Dice 1 4 6 2 3 2 3
Dice 2 4 2 4 5 3 2
Dice 3 2 2 1 2 1 12

Table 1: After 20 rolls

At this point I will leave you to consider whether you think each of the three simulated dice is fair or biased (before reading on).

1 2 3 4 5 6
Dice 1 19 15 10 16 16 24
Dice 2 11 18 20 20 15 16
Dice 3 18 8 13 11 5 45

Table 2: After 100 rolls

I can tell you that in fact dice 1 is biased with distribution (1/7, 1/7, 1/7, 1/7, 1/7, 2/7); dice 2 is fair and dice 3 is of course biased, but its distribution may surprise you: (2/11, 1/11, 2/11, 1/11, 1/11, 4/11).  This example illustrates quite nicely that big differences can be seen very quickly, after only a few observations, but small differences are much more subtle and take longer to observe.  In Table 1 and Table 2 we see pretty much straight away that outcome 6 seems to have a higher probability than the other outcomes, but what is not so straightforward to observe is that rolling a 1 or a 3 has double the probability of rolling a 2, 4 or 5.  At 100 rolls, dice 1 and dice 2 perhaps seem to be showing slightly uneven distributions, but once uncertainty is considered, one might conclude that nothing of interest is going on.  Table 3 gives the spread of outcomes observed after another 1000 rolls, and now we can see these differences in probabilities quite clearly.

1 2 3 4 5 6
Dice 1 176 169 140 133 166 316
Dice 2 188 187 180 190 165 190
Dice 3 220 89 175 99 90 427

Table 3: After 1100 rolls

In deciding whether a treatment effect is evident, we can make two kinds of errors: a false positive (concluding a dice is biased when it isn’t) known as a Type I error, and a false negative (concluding that the dice is fair when it isn’t) known as a Type II error.  When choosing how many people to include in a trial for a new treatment, we must think about how big a treatment effect we might expect to see and also what we would like the probabilities of our Type I and Type II errors to be.  Statistical power is one minus the probability of a Type II error and is the probability that we observe a treatment effect if there actually is one present.  When designing a clinical trial, investigators must weigh up how much statistical power they would like their study to have against the costs (both financial and increased workload) of increasing sample sizes, and the decision as to whether to conclude a new treatment as better than an existing is one that must be approached delicately and sensibly.

So now we have decided that a new treatment is better than an existing, we need to quantify just how much better it is.  Let us examine a headline reported by the Daily Mail in June 2012[2]: ‘Child’s risk of brain cancer triples after just two CT scans’.  A shocking statistic, that may make you think twice about whether to give your child a CT scan should the scenario ever arise.  But let’s take a closer look at the numbers.  In the general population, 0.6 children per every 10,000 aged 0-9 develop leukaemia and 0.4 children per every 10,000 aged 0-9 develop brain cancer.[3]  So what does a tripling of these risks actually mean? Well with CT scans, the risk of developing leukaemia would go from 0.6 to 1.8 children per every 10,000 and the risk of brain cancer would go from 0.4 to 1.2 per every 10,000.  For both outcomes, this is a difference of about 1 child in every 10,000 aged 0-9 and might make you reconsider the risks associated with CT scans, especially when weighed up with the associated benefits.  So what is going on here? Well the Daily Mail wasn’t wrong in reporting a tripling in the risks; they were just reporting relative risks which only tell you how the risk in one group relates to the risk in the other, without telling you anything about the underlying absolute risks.  An important thing to note here is that the underlying risk associated with children aged 0-9 developing leukaemia or brain cancer is very tiny.  In contrast, the lifetime risk of breast cancer for all women is 1 in 8, or 1250 out of 10,000[4] and it has been shown that women who have one alcoholic drink a day could increase this risk by 5%.  So what does a 5% relative increase in this risk mean in absolute terms? The risk of breast cancer in those women who have an alcoholic drink every day would go from approximately 1250 to around 1310 in every 10,000, a difference of 60 women.  So even though we have a relative increase of 5% here compared to a relative increase of 200% when considering CT scans on the chances of developing leukaemia or brain cancer, here we have an absolute difference of 60 in every 10,000 women compared to 1 in every 10,000 children aged 0-9.  The important thing to note here is that a tripling of something very tiny is still something very tiny in absolute terms, but only a small relative increase in something very big is going to be a very big absolute increase.  So a large relative treatment effect doesn’t automatically mean that a difference is clinically important.

The final big question that remains to be addressed is that of safety.  Medicine regulation in the UK became a focus in the 1960s following the outcry over thalidomide.[5]  Thalidomide was prescribed to women during the late 1950s and early 1960s as an effective treatment for the relief of morning sickness in the first few months of pregnancy, but it caused unpredicted serious birth defects and so in a bid to prevent similar occurrences in the future, the Committee on Safety of Drugs was set up in 1963.  Today the Medicines and Healthcare products Regulatory Agency (MHRA) is responsible for the regulation of medicines and medical devices and the investigation of harmful incidents.  Even the most beneficial of treatments can carry with them associated side effects, and these disadvantages are responsible for many promising drugs being dropped and for many licensed drugs being withdrawn.  Open a packet of paracetamol and you will find an accompanying patient information leaflet which, among other things, will outline all the possible side effects that you may experience whilst taking that medicine.  Most common side effects are likely to be mild and not harmful to health, but it is always important to weigh up side effects against the relative merits associated with a particular therapy.  A high level of side effects may be acceptable for a potential treatment for a life threatening illness, but not so much for a treatment used for a more minor ailment.  Cancer drugs, for example, may be the difference between life and death, but they can also make patients feel extremely unwell and leave them at a higher risk of infections.  Whether to pursue a course of treatment under these conditions will not always be straightforward and the decision must sometimes be left down to the patient.  Ultimately, in deciding whether to licence a new treatment, the MHRA must decide whether the benefits outweigh the associated risks and decide whether the side effects are acceptable.  And of course, rare side effects may not become evident until many years after a new therapy has been licensed, so it is important that monitoring continues even after treatments find their way onto the market.

The continuing development of new and better therapies is hugely important to the success of modern medicine, but it is crucial that evidence based medicine remains at the forefront of these evolutions.  Fair tests of treatments that take into account uncertainty and the role of chance are necessary to identify treatment effects accurately.  These treatment effects must then be communicated in an appropriate way so that relative benefits aren’t misleadingly overplayed.  And finally, we must consider the safety of new treatments and assess associated side effects, which may only come to light many years after a therapy is licensed.  And then there’s rest of it, here we have only touched on a very small number of all the questions that must be address in the development of new treatments, but I hope that I have given you (and the students who have attended my Guy Lecture) a flavour of the richness of evidence based medicine.  And I hope that I have convinced you that evidence really does matter!

[1] I Evans, H Thornton, I Chalmers, P Glasziou. Testing treatments: better research for better healthcare. Second Edition (2011). Pinter and Martin Ltd.

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.