# DNA profile evidence: getting the statistics right is as important as getting the technology right

## Features

• Author: David Balding
• Date: 28 Nov 2013
• Copyright: Image appears courtesy of iStock Photo

DNA profile evidence has revolutionised crime fighting and society. Its great effectiveness in identifying offenders has contributed to the decline of levels of crime in many countries, both by getting offenders off the streets and in increased deterrence. Concerns have been expressed right from the start, and there have been many controversies surrounding DNA evidence, based on fears about the effects of contamination, about privacy of genetic information, and about its impact, through shared DNA, on relatives of suspects.

New controversies have been unfolding in recent years, surrounding so-called low-template DNA (LTDNA) evidence. Low template means roughly not much": in fact it is possible to generate imperfect but usable profiles from as little as 20 picograms of DNA, which is equivalent to the DNA content of 3 human cells. This means that profiles can be obtained from a few skin cells resulting from the slightest touch, or even from breath. Tell-tale DNA profiles can now be recovered from crevices in a firearm even after it has been “wiped clean”, and from wire or rope used to bind a victim's hands.

The problem lies in the "imperfect" nature of LTDNA profiles: such small amounts of DNA imply that the resulting profile can be regarded as the outcome of a stochastic phenomenon, affected by various sources of noise''. The graphs generated by DNA profiling machines show peaks indicating the presence of a DNA allele (a DNA fragment of a specified type), but sometimes a peak can be missing because the DNA amplification reaction failed for that allele on each DNA strand available (this is called "allelic dropout"). Spurious peaks can also appear, due to tiny fragments of contaminatory DNA that can affect the sample either at the crime scene or even in the best-run labs that take every precaution against contamination. The spurious peaks can be generated by airborne DNA fragments, and hence the term dropin”.

If true DNA alleles can drop out and spurious alleles drop in, how can such evidence ever be relied upon? It's all in the numbers: if a large majority of alleles from a person of interest are recorded, and the number of dropin alleles is low, and provided that the total number of contributors of DNA (which may include victims and bystanders) is not too large, then the profiling results overall can provide strong indication of the presence of DNA from a suspect.

Recently we have witnessed great controversy over the 2009 convictions of Amanda Knox and Raffaele Sollecito for the murder of Meredith Kercher in Perugia, Italy. Those convictions were overturned in 2011 in large part due to trenchant criticisms from a pair of academic forensic DNA experts about the LTDNA profile evidence presented at the original trial.

Such results also provide plenty of opportunity for challenges by defence teams, and suspicion by commentators about the robustness of any resulting inferences. Recently we have witnessed great controversy over the 2009 convictions of Amanda Knox and Raffaele Sollecito for the murder of Meredith Kercher in Perugia, Italy. Those convictions were overturned in 2011 in large part due to trenchant criticisms from a pair of academic forensic DNA experts about the LTDNA profile evidence presented at the original trial. In 2007 there was great controversy in the UK surrounding the acquittal of a man being tried for the 1998 Omagh bombing. This time it was the judge who expressed grave concerns about the LTDNA evidence, while giving other reasons for acquitting the defendant. The judge's concerns generated headlines such as Verdict raises DNA evidence doubt” (http://news.bbc.co.uk/1/hi/northern_ireland/7154189.stm) and led to the UK Forensic Regulator to set up the Caddy review of LTDNA evidence.

The analysis challenge

The reason that it is possible to make confident identifications from noisy profiling data is replication. It is common to split the DNA sample into two, three, or more parts and profile them independently, so that dropin alleles that appear in only one replicate are easily distinguished from well-replicated alleles. This form of replication is controversial, because splitting a low-template sample reduces the DNA available in each subsample thus increasing the stochastic effects that are the problem that replication is intended to address. However there is another form of replication intrinsic to DNA profiling: different locations on the genome (“loci”) are tested, and the results at each test are essentially independent (under appropriate conditioning). Very often individual loci do not support the prosecution allegation, but accumulating evidence over multiple loci (currently 10 in the UK, we will soon adopt a 16 locus system adhering to a pan-European standard) can generate strong evidence, and even when the evidential strength is modest this can contribute to a sound case when other evidence is available.

There are additional complications that I haven't yet addressed. DNA is a robust molecule that can survive for thousands of years in ideal conditions, but degrades rapidly with increasing temperature and humidity. In current profiling techniques the DNA sequence is not read, but the peaks in the profile graphs reflect the lengths of fragments of DNA. Longer DNA fragments degrade more quickly than shorter ones, and this provides one source of non-independence across loci. Further, with the very sensitive profiling techniques employed for the most small and degraded samples, experimental artefacts can generate false peaks other than the dropin described above. The most important of these is stutter, which arises when imperfect copying of a DNA fragment generates two peaks, a larger peak at the correct fragment length and a smaller peak at a shorter length. This causes no confusion when there is only a single contributor of DNA, because the peak heights distinguish true from stutter peaks, but when there is a mixture of DNA from major and minor contributors, stutter peaks from the major may not be distinguishable from true minor-contributor peaks.

I won't go into further detail on these and other issues, but it should be clear to readers of Statistics Views that resolving these complex issues is a daunting problem requiring the powers of a statistical superhero. These powers include an ability to develop statistical models, deal with nuisance parameters, and assess model fit and robustness. They must also have an ability to work closely with experts in the profiling technology and underlying biology.

A history of neglect

Over the decade or so that highly-sensitive profiling has been employed for low-template and/or degraded DNA samples, the very obvious need for intervention by statisticians to help draw conclusions from such complex data has not been  recognised. Of course it has been recognised by some, but not by those who matter in the criminal justice system. While courts have become embroiled in controversies around LTDNA evidence that have occupied judges, lawyers and senior officials for many months and at great expense to the public purse, essentially nobody was willing to fund the research that would have cost only loose change in comparison, and which could have put LTDNA evidence on a solid foundation years ago.

Over the decade or so that highly-sensitive profiling has been employed for low-template and/or degraded DNA samples, the very obvious need for intervention by statisticians to help draw conclusions from such complex data has not been recognised.

Instead of investing in research to solve the problem, the forensic establishment has muddled through with half-baked efforts to evaluate LTDNA evidence, based usually on over-simplification and ignoring data. While this has led to some unsatisfactory prosecutions, entangling of courts in lengthy appeals, and media controversy, the greatest cost to society probably lies in the failure to bring to courts many other cases in which there was potentially powerful evidence that might have fairly convicted the guilty had it been properly evaluated.

The failure to fund the necessary research on the necessary scale is not only due to lack of funds. Courts and media commentators have failed to recognise the need, focussing wrongly on the question “is the technology reliable?”. This is a useless question, because the meaning of “reliable” is too vague. The right questions are “can we fairly evaluate the strength of evidence generated by this technology” and “can we meaningfully convey the evidential weight to courts?”. The question about evaluation of evidence trumps questions about reliability, because unreliable evidence will support only weak conclusions about evidential weight. Any amount of noise” can be correctly accommodated by an appropriate analysis and will not lead to unjustified conclusions. The expert panel of the UK Caddy review made many important recommendations. They noted that “There needs to be a national agreement on how LTDNA profiles are to be interpreted especially in relation to “allele drop in and out”, stochastic effects, inhibition, and mixtures” but they did not give due emphasis to the over-riding importance of statistical inference, without which none of the other issues they dealt with matter.

A better future

Fortunately the situation is improving recently, ten years too late. Various agencies in Australia and New Zealand have pooled resources to fund the development of the STRmix software for the evaluation of LTDNA evidence, the European Commission has funded EUFORGEN network which is associated with the Forensim R package that performs a number of simulation and inference tasks for LTDNA evidence. The present author has in recent years been developing the open-source likeLTD software, now also an R package as is DNAmixtures developed by researchers in the UK. The TrueAllele software has been available for many years, but because of commercial confidentiality it lacked the level of scrutiny required for justice to be seen to be done. It can now be compared with rival software, some of which is open source.

Many of these developments are new and extensive comparisons of the different softwares are not available, but there are a number of publications and technical documents reporting results of validation studies. The main distinction between modelling approaches is that some take the view that the information in peak heights is too variable and sensitive to  parameters for them to be useful sources of information, and so peak heights are replaced with a binary presence/absence, and perhaps also an uncertain” category. Other approaches do try to extract information from peak heights. This is extremely valuable for some profiles, particularly when replication has not been attempted, but the additional information may come at the cost of additional sensitivity to modelling assumptions. There has not yet been any extensive investigation of these issues, which is a priority for future research. Meanwhile the new methods are finding their way into courts, in some cases after substantial challenge, and appear to provide a great improvement on preceding methods.

Sources

Steele, C.J. and Balding, D.J. 'Statistical Evaluation of Forensic DNA Profile Evidence, Annual Review of Statistics and Its Application, Vol. 1: (Volume publication date January 2014), A pre-publication .pdf is available online at DOI: 10.1146/annurev-statistics-022513-115602

View all

View all