  • Date: 16 Aug 2012

There is a lot of debate about what the precise definition of Big Data should be. From techniques and technologies that make capturing value from data at an extreme scale economical, through to a more simple general term used to describe the voluminous amount of unstructured and semi-structured data a company or research creates -- data that would take too much time and cost too much money to load into a relational database for analysis. Regardless of the precise definition, the term Big Data is often used when speaking about petabytes (1015 bytes) and exabytes (1018 bytes) of data. (An exabyte of storage could contain 50,000 years' worth of DVD-quality video.) In the world of business, Walmart handles more than 1 million customer transactions every hour; its databases contain more than 2.5 petabytes of data – the equivalent of 167 times the information contained in all the books in that standard unit of what now counts as Small Data, the US Library of Congress.

The August issue of Significance, the magazine of the Royal Statistical Society and the American Statistical Association, is devoted to the hot topic of Big Data and considers just how plentiful data is, its impact on all of us and what we need to do to be able to utilize it effectively. To gauge the importance of Big Data, the theme of the recent San Diego Joint Statistical Meetings was "Statistics: Growing to Serve a Data Dependent Society". Around 6000 statisticians attended hundreds of sessions at JSM to discuss various aspects of Big Data, from how to provide relevant education and job training to how the amount of data benefits and challenges statistics and other professions.

In his editorial in the special issue, editor Julian Champkin states that, "statisticians should lay claim, now, to the expertise that extracts truths from Big Data. They should stake out and prove the claim that analysis of Big Data is a statistical skill. They should make sure that those skills are part of statistical training; should grab the people who have or who want to acquire those skills, and grab them young and, in the process of giving them those skills, give them also the belief that what they are becoming is statisticians and that what they are doing is statistics. It is a brave new world out there, and it is a very large chunk of the future."

In the issue, Sallie Ann Keller, Steven E. Koonin and Stephanie Shipp discuss how Big Data is transforming our cities, look at the benefits and some of the challenges that Big Data can bring to society.

We know our brave new world is being transformed by data. But how much more of it is there than before? Martin Hilbert finds huge untapped capacity to process information in his article, ‘How much information is there in the “information society”?’

Ping Ma looks at how Big Data from seismographs and complex analysis can be used to map the earth's core in his article 'A statistical journey to the centre of the Earth'.

Astronomy has been one of the first areas of science to embrace and learn from Big Data. Eric D. Feigelson and G. Jogesh Babu from Penn State tell of the efforts, the challenges, and some of the ways it is transforming our knowledge of the cosmos.

Language recognition programs use massive databases of words, and statistical correlations between those words, to translate or to recognise speech. But correlation is not causation. Do these statistical data-dredgings give any insight into how language works? Or are they a mere big-number trick, useful but adding nothing to understanding? One who holds the latter view is the theorist of language Noam Chomsky. Peter Norvig disagrees in his piece ‘Colorless green ideas learn furiously: Chomsky and the two cultures of statistical learning’.

Michael Salter-Townshend, an entrant in the Significance Young Statisticians Group writers' competition, analysed his own circles of friends within social networks. The huge databases created by Facebook, Twitter, LinkedIn prove ripe fields for statisticians.

Thomas Lansdall-Welfare and colleagues exploit a whole new tool for social scientists when they look at how the vast datastreams generated from these social networks can nowcast the mood of a nation.

Significance is a bi-monthly magazine for anyone interested in statistics and the analysis and interpretation of data. Its aim is to communicate and demonstrate in an entertaining, thought provoking and non-technical way the practical use of statistics in all walks of life, and to show informatively and authoritatively how statistics benefit society. It is published on behalf of the Royal Statistical Society and the American Statistical Association.

