Wednesday, May 7, 2014

Review: Big Data: A Revolution That Will Transform How We Live, Work, and Think

Big Data: A Revolution That Will Transform How We Live, Work, and Think
Viktor Mayer-Schonberger and Kenneth Cukier
Eamon Dolan/Houghton Mifflin Harcourt; 1 edition (March 5, 2013)
ISBN-10: 9780544002692
ISBN-13: 978-0544002692
ASIN: 0544002695

If we were previously unaware of the role big data plays in contemporary life, the revelation in June that the NSA vacuums up a billion domestic telephone records a day will have wakened us from our slumber. Something new and deeply unsettling is under way in the bowels of governments and corporations. And yet, as becomes very clear in Big Data: A Revolution That Will Transform How We Live, Work, and Think, the weakening of personal privacy is only one of the areas undergoing profound change as a result of the explosion of data.

It is difficult to grasp the amount of data being produced around us. Whatever statistics one chooses to quote (e.g. 90 percent of all existing data has been created in the last two years and the sum is doubling every two years) the amount of data is overwhelming. However, big data is about more than the volume of data. When that avalanche of data, stored in a host of databases, is sifted by smart computer algorithms, the effect is transformational. In terms of the attack on privacy, the threat goes far beyond simple surveillance, the mere tracking of individuals by means of their cell phone data, or credit card use, or Internet activity, or CCTV cameras that can  to recognize faces and gaits, etc. Information about a person can be extracted from many sources, often quite far-flung, and the combination is vastly more intrusive, more revealing, and more troubling. What emerges about us is not only a picture of our lives more complete than any of our friends know but also a platform upon which can be built frighteningly accurate predictions. In fact, correlations between information about an individual and the probability of their behaving in certain ways is so compelling that we need to worry about becoming a society like the one in the movie Minority Report, where people are arrested based on predictions, before they commit a crime.

Suspicions about a future society in which privacy has disappeared are commonplace, but Big Data anchors those suspicions in fact and delineates their connections to other developments. While the book gives a good deal of space to outlining the more neutral changes and the great benefits that big data will bring, particularly in health care, it at the same time carefully points out the dark side. A full chapter is devoted to recommendations for ameliorating the harmful effects of big data. Yet, ironically, that chapter is cold comfort, for the authors, after showing how extremely difficult it is to control what can be done with personal data, conclude that the most effective approach is to leave regulation in the hands of the corporate and government owners of the data, overseen in rather vague ways by general principles and “data auditors.” May better ideas be found soon.

The most enlightening parts of the book, however, are not about personal privacy, since so much has been written on that subject elsewhere. Separate chapters cover three fundamental transformations in our ways of thinking that are under way, which the authors describe as N=all, messiness, and the decline of causality. By N=all they mean our increasing ability to access more information on a topic until we begin to approach all the possible data. This overturns many of our current approaches to finding truth, particularly in the social sciences. In the past we have carefully selected samples from an overall population, generated a likely hypothesis or two, then subjected our samples to analysis in order to  produce statistical facts about the whole. That was the process in what the authors call “the era of small data.” For when enormous datasets replace those small samples, things change. For a start, the traditional small data tools— hypotheses dreamed up by humans, surveys,  questionnaires, analysis supervised by humans—become obsolete. And computers, rather than putting a small number of hypotheses to the test, can search through vast quantities of data and generate essentially all possible hypotheses. When, for example, the Google data scientists attempted to track the spread of flu by analyzing the search terms of its users (3 billion per day), it did not investigate only plausible search terms, such as “medicine for cough and fever.” Their computers looked for correlations between search queries and the historical data on the spread of the flu and came up with 450 million mathematical models or hypotheses. Testing each of those finally discovered a  combination of 45 search terms that produced results as good as, but faster than, the Center for Disease Control with its traditional reporting methods. Human-centred analyses cannot compete with that kind of thoroughness and the speed of those results. Another benefit of N=all datasets is that they provide accurate information about small subgroups in a way that the old sampling methods cannot. By messiness the book refers to errors in the data. As our data expands toward N=all, a certain amount of messiness in the data becomes acceptable, since it represents such a small proportion of the whole that it has no statistical significance.

Perhaps the most radical change will be the devaluing of causality. What computers find are correlations, not causality. They might search through data and find, for example, that people who consumed ginseng, aspirin, and fried earwigs experienced a remission of their cancer. The discovery of that cancer-curing diet would be of paramount importance, and the precise, cause-and-effect  mechanism would be of secondary interest. Psychologists have told us for some time that our need to see things in causal terms, while apparently hard-wired in our minds, often makes us see connections that are not there and fall prey to illusions. Perhaps, after all, causality doesn’t matter quite as much as we thought. As we discover truths more and more by correlation, a de-emphasis on the search for causality might be the most profound effect of big data.


In 1995 Nicholas Negroponte’s Being Digital taught us how to think less in terms of atoms and more in terms of bits. Big Data may serve as a similar guidebook to yet another fundamental change in our world.

No comments:

Post a Comment