Embrace Big Data or ignore at your peril
- Author: Statistics Views
- Date: 07 Nov 2012
- Copyright: Photograph copyright of Statistics Views. From right, Paul Boyle, Professor Harvey Goldstein, Polly Toynbee, Paul Woobey and Dr Farida Vis
On the evening of Monday 5th November, the British Academy and Sage hosted a Big Data Debate to form part of the Economic and Social Research Council's Festival of Science. This debate formed a panel chaired by Guardian journalist and former BBC Social Affairs Editor Polly Toynbee, which included Paul Boyle, Chief Executive of the ESRC, Professor Harvey Goldstein, Professor of Social Statistics at the University of Bristol, Dr Farida Vis, Research Fellow in Social Sciences at the University of Sheffield and Paul Woobey, the Chief Information Officer (CIO), Director of the Strategy and Standards Directorate, Head of the IT Profession and Senior Information Risk Owner of the Office for National Statistics.
A recent article in Significance pointed out that ‘Big data is bringing a revolution in science and technology’ and that it will bring about a more profound revolution once it interacts with people’s decisions. The world is dealing with what Toynbee opened the panel by referring to a “deluge of data” that the world is dealing with every day – blogs, social media, mobile technology. The debate raised issues on how society will deal and what it can gain from big data; what are the dangers and the advantages? Each panellist gave a presentation on their thoughts and ideas on Big Data.
Paul Boyle – ‘Big Data’
Paul Boyle began by looking at longitudinal data from world leading studies such as the British Household Panel Study and Understanding Society, Life Study and the Birth Cohorts (1958, 1970 and the Millennium). He also referred to the “deluge of data” but emphasised the need for theoretical grounding. It was essential to “resist data for data’s sake”, meaning that first it was important to ensure that the data was accessible, interpretable and interoperable.
The trust of the public is very important – engaging and consulting with the public and involving them on the governing board but it is difficult with access to private data remaining patchy. There needs to be genuine interdisciplinary design for Big Data but there are few examples of projects where this has been carried out, Boyle using clinical trials examining the placebo and pravastatin as an example.
Big Data can be useful in analysing lots of administrative data and in December 2011, the Administrative Task Force (ADT) was established which aims to propose new mechanisms and collaborative agreements to enable and promote the wider use of administrative data for research and policy purposes.
Boyle concluded that the training of people to deal with Big Data is vitally important - deeper training and a greater recognition of skills of those who don’t necessarily fit into an academic background but who would be able to make a valuable contribution to Big Data.
Professor Harvey Goldstein – Big Data – social liberation or numerical investigation?
Goldstein started by comparing various famous big data sets, the biggest of which he could find was Higgs boson data, which consists of 100 zettabytes. Whilst we are not at this stage in Big Data at the moment, we may well be in five years’ time.
Currently Big Data can assist in the data mining of consumer spending patterns; the simple mapping of structures; analysis of publications to produce journal and author citation rankings and analyzing Google searches and tweets.As Goldstein says in an interview held at the Debate, “the potential for big data is enormous” as it facilitates better decision-making; its huge size allows for the study of subtle interactions not traditionally possible in social research; makes citizens more informed and enhances democracy. For example with decision-making, as we receive bigger data sets in clinical trials, it can be easier to decide whom to treat first, but this then leads on to the downsides of Big Data.
Big Data could lead to (possibly deliberate) misleading inferences by the media and others and could also facilitate commercial or official institutions’ control over citizens. This introduced a hotly discussed topic at the Debate – the idea of the ethical issues over this sort of control that Big Data could dangerously give.
In order to cope with Big Data, Goldstein concluded that it needed to be “embraced, not ignored.” There needs to be a large scale education of providers and users, with the British Academy’s Language and Quantitative Skills Programme and the Royal Statistical Society’s getstats campaign a useful start. Also, there should be shared understandings about data interpretation with research communities which do not exist more generally and how these can be promoted. Social scientists themselves need to re-orientate their thinking and also to develop existing methodologies to deal with the challenges that will be inevitably faced. Finally, Big Data is a way of encouraging informed debate such as in this event.
Paul Woobey - Is Big Data new or just an illusion?
Woobey pointed out that Big Data is all about high volume, high velocity and high variety. It poses significant issues concerning data security, privacy, data storage constraints and unclear “ownership” of data. The question facing the Office for National Statistics is how they can demonstrate that Big Data is valid. Here Woobey stressed that we need to be looking at “taking data not at face value but what it can infer.” We use data to validate the information we already have, the meaning can often be hidden and with Big Data, everything can become more comprehensible with such amounts of backup data.
The Australian National Statistical Institute has the entirety of till data throughout the world (checkout data from major supermarkets – Tesco, Sainsbury’s, etc.). Such Big Data calls into question how does one begin to analyses such information?
Similarly to Boyle, Woobey noted that a highly skilled workforce needs to be built, adding that “Big data broadly encompasses data acquired from multiple channels that are combined in novel ways to reveal phenomena that otherwise could not be detected.”
The ONS are progressively moving towards in dealing with more administrative data (e.g. tax, benefits) and Big Data can be useful in corroborating data that we already have. Woobey examined Google trends and found that various data sources and channels embody discrete sets of attributors. As Woobey explained later to Statistics Views after the debate, Big Data can be used to underpin the surveys that the ONS carry out and ensure that the data is turned into the right information for the public.
Dr Farida Vis – Big Adventures in Big Data: a cautionary approach
Dr Vis explained that her work has involved collecting social media data, examining the reporting of global crises, visual culture, knowledge practices and civic engagement within different (social) media environments. She has worked on projects such as examining how YouTube users responded to anti-Islam film Fitna released online by Dutch politician Geert Wilders in March 2008 and the use of controversial images in the aftermath of Hurricane Katrina and through her work, she has found the potential for cross-fertilization between data journalism and analytical insight.
Dr Vis added that there is now a tension between Small and Big Data and in this year of the Turing Centenary, the role of algorithms has been highlighted as a valuable tool for extracting information from little data. Qualitative methods interpret small data well, such as how the citations levels for journals can be influenced.
The danger with Big Data is that it has implications of reducing people to a number and where do humans come in? With the assistance of data visualization, we can become obsessed with wanting the whole picture that Big Data can offer us.
Dr Vis concluded by asking the questions: Who gets to do Big Data? What kind of funding will there be? What do we invest in?
Conclusions by panel
In the closing minutes of the debate, Toynbee invited the panellists to discuss and to reflect on each other’s presentations. Toynbee asked that regarding the danger that Big Data can reveal all we want to know, are we in danger of our Society being distrustful and suspicious of each other. Professor Goldstein would be interested to see if there was any research on this and Paul Woobey added that despite the response rate for the last census was very high, the responses from their surveys, in general, are decreasing. Perhaps society is not as willing as it used to be to partake in surveys; can we not be bothered during our busy days?
In summary of the four panels, all agree that Big Data can be a great opportunity for society and the key to its success is the training of a highly skilled workforce who can deal with this data. However, there is the danger that as human beings, we are reduced to being numbers and Big Data calls into question the ethics of having such enormous data sets available to the general public as a potential invasion of our private lives.