Can football become the next metrics-based sport?


  • Author: Carlos Alberto Gómez Grajales
  • Date: 04 Jul 2014
  • Copyright: Image appears courtesy of iStock Photo

Being the most popular sport on earth, it is surprising the lack of serious statistical research done about it. After baseball and basketball dramatically changed due to statisticians, can football become the next metrics-based sport?

thumbnail image: Can football become the next metrics-based sport?

The date is July 11th, 2010. In some parts of the world, the streets are almost deserted. Nothing is happening, the world has stopped. In Spain, for instance, history is being made while 15.6 million people are watching TV, with a record shattering 86% share of the audience tuning in to the same program, the one that has become the highest rated TV broadcast in Spanish history (1). The rest of the world has also turned its attention to the same event: the World Cup final. After a dramatic game that went on to extra time, Spain became the World Champions in front of 909 million in-home viewers who watched at least one minute of that match (2).

Football is the world's favourite sport. It encompasses the joy, the passion and the dreams of the whole planet. This is the sport that gathers the nations every four years to witness the greatest sporting competition devoted to a single discipline. And even though it may be the most important sport on earth, most may think that it is still behind in one crucial aspect: the use of statistics. Being that popular, you'd expect football clubs and countries to invest fortunes on tools and techniques to improve and optimize their performance, yet football is famous for not using one of the most current and innovative ideas in the recent sport scene - statistical analysis.

You may have heard of Moneyball, a book published by Michael Lewis in 2003 describing the rise that the Oakland Athletics baseball team and its general manager Billy Beane had when they adopted a statistics based approach to playing Baseball. The author traces back the use of statistical tools on baseball back to the 1980's and 1990's, but his focus is on this particular American team that came to prominence by conforming a team of undervalued players, chosen with some really unorthodox metrics for the time.

The virus that infected professional baseballl in the 1990s, the use of statistics to find new and better ways to value players and strategies, has found its way into every major sport. Not just basketball and football, but also soccer and cricket and rugby and, for all I know, snooker and darts.

- Michael Lewis

Moneyball changed baseball for good. After the publication of the book teams such as the New York Mets, the New York Yankees, Boston Red Sox, and the Cleveland Indians, amongst others, hired full-time analysts to improve their chances. Right now, statistical analysis is no longer giving an edge to baseball teams, its just leveling the field, with about a dozen of professional teams being aided by professional statisticians, specialized in what is now known as sabermetrics (3).

Michael Lewis, the author of the book, wrote in The New York Times in 2009 that “The virus that infected professional baseball in the 1990s, the use of statistics to find new and better ways to value players and strategies, has found its way into every major sport. Not just basketball and football, but also soccer and cricket and rugby and, for all I know, snooker and darts – each one now supports a subculture of smart people who view it not as a game but as a problem to be solved” (3).

What Michael Lewis said is true. The MIT Sloan's Sports Analytics Conference has recently celebrated its 8th event. In a talk by Daryl Morey, the general manager of the Houston Rockets it was discussed how NBA teams are starting to shoot a lot more corner 3's and score a lot more points from high percentage layups and dunks from inside the paint. This realization largely stems from analytical work done in basketball (4). And just as baseball and basketball has been dramatically changed by statistics, there's a recent tendency to invest on numerical analysis to find the new, optimized route for winning in football. And even if the breakthroughs haven't been as evident in this sport, since football is described as a more complex sport, harder to define and to analyze methodically, this hasn't stopped some clever and creative analysts in the task of developing mathematical models and rigorous research aimed to understand what football is and what it can become.

Our first and major data crunchers devoted to analyzing football are those same guys who study every other major sport in hopes of predicting results: gamblers. I mean, you can bet on the odds of Luis Suárez biting another player, an apparent certain profit based on what we've seen (5). So, if someone calculated odds for such a weird event, there's certainly people working on football odds, a sport that attracted the eyes of half the earth's population during the 2010 World Cup. The betting industry employs some very well paid statisticians to crunch historical results in order to provide mathematical estimates of the probabilities of a match outcome (6). By considering factors such as home advantage, the historical strength of the team and some metrics like goal difference, betting companies can estimate the chances of each result. With this, odds can be adjusted to maximize the profits, similar to what is done with some other sporting events.

Now, let's take a look at some very specific football related models. Some football clubs have started to invest on research facilities designed to improve the performance of their teams. The Italian team AC Milan created MilanLAB, what it calls a High Tech Scientific Research Center, aimed to investigate playing styles and techniques, in an effort to avoid injuries and to increase a player's physical performance (7). With the help of sophisticated non-linear models, in what happens to be a particular form of neural network analysis, MilanLAB transforms vast amounts of numeric medical statistics into meaningful predictions. Through predictive statistical models, they developed a system that works to predict the possible risks to the players' health and performance. The Lab claims that their model improves the team's results by optimizing their players' physical condition and by allowing them to achieve their optimum performance. Yet not all football related studies are done by people eager to become rich or to benefit the trophy racket of selected clubs. Some actual, statistical research is done and published, in an effort to promote and improve a game that has become the life of millions around the world.

A paper authored by Nobuyoshi Hirotsu and Mike Wright, researchers from Lancaster University, proposed a statistical model of a football match to identify and analyze the characteristics of the playing teams. By means of maximum likelihood estimators, the authors aimed to characterize the different teams of the English Premier League based on factors such as home advantage, offensive and defensive strength and their interactions (8). Based on data from the 1999–2000 season, they were able to illustrate the characteristics of the teams in ways that allowed them to compare and classify them. What I love about this paper is their definition of a football match:

“A football match can be seen as progressing through a set of stochastic transitions due to a change of possession of the ball or the scoring of a goal”.

That is the world's passion explained with mathematical elegance. What this actually means is that a football match has four states: Team A has the ball, Team B has the ball, Team A scores and Team B scores. It is perfectly defined as a form of Markovian process since you need to have the possession of the ball to be able to score, therefore, there's no way to go from Team B has the ball to Team A scores.

This statistical model allowed to define Premier League teams with some mathematical rigor, thus painting an interesting portrait of the game style of different teams. For instance, Manchester United played the 1999-2000 season as a very offensive team, very good at keeping ball possession, yet their defensive capacity was only average. Arsenal had a very similar style of playing, for example. Liverpool, on the other hand, was the best team at defense yet their ability to score was below the average of the league, quite similar to how Chelsea played, though the latter didn't lose the ball so often.

And not only the players of the game have been subject to statistical analysis, the referees have been as well. In a really interesting study by John Goddard, a group of researchers tested whether football referees tend to favour the home team during a match - a really common idea among football aficionados. By using Poisson regression, a type of models preferred for count data, the analyst measured the effect that the home advantage might have on the number of yellow/red cards the referees assigned to football teams, as a way to test the hypothesis of home teams being punished less often and less severely (8). With some quite interesting results, the authors conclude that there is effectively a tendency for away teams to receive more disciplinary warnings than home teams, a tendency that cannot be explained solely by the effects of team quality, nor by the incentive that the home team has to play more offensively. As such, the statistical evidence seems to point towards a real home team bias in the award of yellow and red cards. In defense of the referees, the effect seems to be somewhat minor than some might expect. But still, whenever your team loses and you wish to blame the referee, you now have the support of reliable statistical research. Regarding the “researching the obvious” part of the study, the analysis found that not all referees are equally good. The data showed some clear inconsistencies between referees in the interpretation or application of game rules, which is something that almost the entirety of football fans in the world have noticed by watching the recent World Cup.

With some quite interesting results, the authors conclude that there's effectively a tendency for away teams to receive more disciplinary warnings than home teams, a tendency that cannot be explained solely by the effects of team quality, nor by the incentive that the home team has to play more offensively.

And guess what? Football is a worldwide passion, enjoyed by everyone, from kids, workers, politicians and also, as you may imagine, by statisticians. It is remarkable the number of 'spare time' analysis done by individuals and companies that wish to understand, decipher or uncover little details in the development of a game. There's been an analysis aimed to detect the most dirty teams in the Mexican league, those that happen to use more fouls and also to receive a higher number of warning cards (9). We also have studies using the Poisson distribution model to predict the chances of seeing a high number of goals during any given day of football matches, thus predicting those happy days of remarkable, exciting football games (10). The author noticed that, on 5th February 2011, 41 goals were scored in 8 matches, for an average of over 5 goals per match. A remarkable feat, even more if you consider that the historical average of goals per match for the Premier League happens to be 2.6 goals, a number estimated from a data set with results of every Premier League match since its inception in 1992 up until the end of the 2009/2010 season. By means of the almighty Poisson distribution, we could get an estimate of the odds of such a rare Saturday: 18,000 to 1. Even when taking into account the not so amazing results of next Sunday, this was a weekend you'd expect about once every 20 years.

And, of course, our little tour on Football statistics couldn't be complete without those hundreds of predictions that are all over the place during important tournaments, you know, like Brazil's World Cup and stuff (11) (12). You may find a lot of those online without effort, yet I chose to reference some actually cool predictions, based on some grounded theoretical background. My point is that such “amateur” models are usually done by some remarkably intelligent people, who don't have the resources, nor time to produce research quality models, yet they have the passion and the love for a game that still has hundreds of secrets waiting to be found by statisticians.

So football is changing, becoming more and more statistical. And even though we haven't yet seen a dramatic change on how clubs understand the sport by means of statistical analysis, it may be just a matter of time until the most important sport on earth finds its own “Moneyball” moment. There's already thousands of statisticians worldwide looking to get there first.


(1) 2010 FIFA World Cup Draws Record Ratings in USA, Europe, and Beyond. Bleacher Report. July 2010.

(2) Almost half the world tuned in at home to watch 2010 FIFA World Cup South Africa. (July, 2011)

(3) Kuper, Simon & Szymanski, Stefan. Soccernomics. Nation Books; Original edition (October 27, 2009)

(4) Sloan Sports Analytics Conference Overview: The State Of Analytics In Soccer. (March, 2014)

(5) There's Money To Be Made Gambling on Luis Suárez Biting Someone. The Wall Street Journal (June, 2014)

(6) Calculation of Odds: Probability and Deviation. Soccer Window

(7) Milan LAB The Official AC Milan website

(8) Hirotsu, Nobuyoshi & Wright, Mike. An evaluation of characteristics of teams in association football by using a Markov process model. The Statistician (2003) 52, Part 4, pp. 591–602

(8) Goddard, John. Is the ref blind? Crime and punishment in English Premier League football. Significance Magazine. Volume 4 Issue 2 (June 2007) Doi: 10.1111/j.1740-9713.2007.00227.x

(9) Jugando Sucio. Asesoría Estadística Especializada.

(10) Wallace, Michael. 41 goals in a day?! How unusual was last Saturday's football? Significance Magazine Website

(11)  Baio, Gianluca & Blangiardo, Marta. The World Cup forecast: how do things look 17 games in? Significance Magazine Website

(12) World Cup calculations. The Norwegian Computer Center

Related Topics

Related Publications

Related Content

Site Footer


This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.