Big Data: A
Revolution That Will Transform How We Live, Work, and Think
Viktor Mayer-Schonberger and Kenneth Cukier
Eamon Dolan/Houghton Mifflin Harcourt; 1 edition (March 5,
2013)
ISBN-10: 9780544002692
ISBN-13: 978-0544002692
ASIN: 0544002695
If we were previously unaware of the role big data plays in
contemporary life, the revelation in June that the NSA vacuums up a billion domestic
telephone records a day will have wakened us from our slumber. Something new
and deeply unsettling is under way in the bowels of governments and
corporations. And yet, as becomes very clear in Big Data: A Revolution That Will Transform How We Live, Work, and Think,
the weakening of personal privacy is only one of the areas undergoing
profound change as a result of the explosion of data.
It is difficult to grasp the amount of data being produced
around us. Whatever statistics one chooses to quote (e.g. 90 percent of all
existing data has been created in the last two years and the sum is doubling
every two years) the amount of data is overwhelming. However, big data is about
more than the volume of data. When that avalanche of data, stored in a host of databases,
is sifted by smart computer algorithms, the effect is transformational. In
terms of the attack on privacy, the threat goes far beyond simple surveillance,
the mere tracking of individuals by means of their cell phone data, or credit
card use, or Internet activity, or CCTV cameras that can to recognize faces and gaits, etc. Information
about a person can be extracted from many sources, often quite far-flung, and the
combination is vastly more intrusive, more revealing, and more troubling. What
emerges about us is not only a picture of our lives more complete than any of
our friends know but also a platform upon which can be built frighteningly
accurate predictions. In fact, correlations between information about an
individual and the probability of their behaving in certain ways is so
compelling that we need to worry about becoming a society like the one in the
movie Minority Report, where people
are arrested based on predictions, before they commit a crime.
Suspicions about a future society in which privacy has
disappeared are commonplace, but Big Data
anchors those suspicions in fact and delineates their connections to other
developments. While the book gives a good deal of space to outlining the more
neutral changes and the great benefits that big data will bring, particularly
in health care, it at the same time carefully points out the dark side. A full chapter
is devoted to recommendations for ameliorating the harmful effects of big data.
Yet, ironically, that chapter is cold comfort, for the authors, after showing
how extremely difficult it is to control what can be done with personal data, conclude
that the most effective approach is to leave regulation in the hands of the
corporate and government owners of the data, overseen in rather vague ways by
general principles and “data auditors.” May better ideas be found soon.
The most enlightening parts of the book, however, are not
about personal privacy, since so much has been written on that subject elsewhere.
Separate chapters cover three fundamental transformations in our ways of
thinking that are under way, which the authors describe as N=all, messiness,
and the decline of causality. By N=all they mean our increasing ability to
access more information on a topic until we begin to approach all the possible
data. This overturns many of our current approaches to finding truth,
particularly in the social sciences. In the past we have carefully selected samples
from an overall population, generated a likely hypothesis or two, then subjected
our samples to analysis in order to produce
statistical facts about the whole. That was the process in what the authors
call “the era of small data.” For when enormous datasets replace those small
samples, things change. For a start, the traditional small data tools—
hypotheses dreamed up by humans, surveys,
questionnaires, analysis supervised by humans—become obsolete. And
computers, rather than putting a small number of hypotheses to the test, can
search through vast quantities of data and generate essentially all possible
hypotheses. When, for example, the Google data scientists attempted to track
the spread of flu by analyzing the search terms of its users (3 billion per
day), it did not investigate only plausible search terms, such as “medicine for
cough and fever.” Their computers looked for correlations between search
queries and the historical data on the spread of the flu and came up with 450
million mathematical models or hypotheses. Testing each of those finally
discovered a combination of 45 search
terms that produced results as good as, but faster than, the Center for Disease
Control with its traditional reporting methods. Human-centred analyses cannot
compete with that kind of thoroughness and the speed of those results. Another
benefit of N=all datasets is that they provide accurate information about small
subgroups in a way that the old sampling methods cannot. By messiness the book
refers to errors in the data. As our data expands toward N=all, a certain
amount of messiness in the data becomes acceptable, since it represents such a
small proportion of the whole that it has no statistical significance.
Perhaps the most radical change will be the devaluing of
causality. What computers find are correlations, not causality. They might
search through data and find, for example, that people who consumed ginseng, aspirin,
and fried earwigs experienced a remission of their cancer. The discovery of
that cancer-curing diet would be of paramount importance, and the precise,
cause-and-effect mechanism would be of
secondary interest. Psychologists have told us for some time that our need to
see things in causal terms, while apparently hard-wired in our minds, often
makes us see connections that are not there and fall prey to illusions. Perhaps,
after all, causality doesn’t matter quite as much as we thought. As we discover
truths more and more by correlation, a de-emphasis on the search for causality
might be the most profound effect of big data.
In 1995 Nicholas Negroponte’s Being Digital taught us how to think less in terms of atoms and
more in terms of bits. Big Data may
serve as a similar guidebook to yet another fundamental change in our world.
No comments:
Post a Comment