In his young career, Jeffrey Hammerbacher has been a scout on the frontiers of the data economy.
In 2005, Hammerbacher, then a freshly minted Harvard graduate, did what many math and computing whizzes did. He went to Wall Street as a “quant,” building math models for complex financial products.
Looking for a better use for his skills, Hammerbacher departed to Silicon Valley less than a year later and joined Facebook. He started a team that began to mine the vast amounts of social network data Facebook was collecting for insights on how to tweak the service and target ads. He called himself and his co-workers “data scientists,” a term that has since become the hottest of job categories. Since then he has transitioned into a very different professional path. He joined the Mount Sinai School of Medicine in New York as an assistant professor, exploring genetic and other medical data.
The story is the same in one field after another, in science, politics, crime prevention, public health, sports and industries as varied as energy and advertising. All are being transformed by data-driven discovery and decision-making.
The pioneering consumer Internet companies, like Google, Facebook and Amazon, were just the start, experts say. Today, data tools and techniques are used for tasks as varied as predicting neighborhood blocks where crimes are most likely to occur and injecting intelligence into hulking industrial machines, like electrical power generators.
Big Data is the shorthand label for the phenomenon, which embraces technology, decision-making and public policy. Supplying the technology is a fast-growing market, increasing at more than 30 percent a year and likely to reach $24 billion by 2016, according to a forecast by IDC, a research firm. All the major technology companies, and a host of startups, are aggressively pursuing the business.
Demand is brisk for people with data skills. The McKinsey Global Institute, the research arm of the consulting firm, projects that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired, by 2020.
Yet the surveillance potential of Big Data, with every click stream, physical movement and commercial transaction monitored and analyzed, would strain the imagination of George Orwell. So what will be society’s ground rules for the collection and use of data? How do we weigh the trade-offs involving privacy, commerce and security? Those issues are just beginning to be addressed. The debate surrounding the recent disclosure that the National Security Agency has been secretly stockpiling telephone call logs of Americans and poring through email and other data from major Internet companies is merely an early round.
What is Big Data?
Big Data is a vague term, used loosely, if often, these days. But put simply, the catchall phrase means three things. First, it is a bundle of technologies. Second, it is a potential revolution in measurement. And third, it is a point of view, or philosophy, about how decisions will be — and perhaps should be — made in the future.
The bundle of technologies is partly all the old and new sources of data — Web pages, browsing habits, sensor signals, social media, GPS location data from smartphones, genomic information and surveillance videos. The data surge just keeps rising, doubling in volume every two years. Just two days of the current global data production, from all sources — 5 quintillion bytes (a letter of text equals one byte) — is about equal to the amount of information created by all the world’s conversations, ever, according to research at the University of California, Berkeley.
Yet the importance of the sheer volume of data — and its exponential growth path — can be overstated. There’s a lot of water in the ocean, too, but you can’t drink it. Beyond advances in computer processing and storage, the other essential technology is the clever software to make sense of all that data. These are largely tools taken from the steadily evolving world of artificial intelligence, like machine learning.
The increasing volume and variety of data, combined with smart software, may well open the door to what some people call a revolution in measurement. This technology, they say, is the digital equivalent of the telescope or the microscope. Both of those made it possible to see and measure things as never before — with the telescope, it was the heavens and new galaxies; with the microscope, it was the mysteries of life down to the cellular level.
Data-driven insights, experts say, will fuel a shift in the center of gravity in decision-making. Decisions of all kinds, they say, will increasingly be made on the basis of data and analysis rather than experience and intuition — more science and less gut feel. Data, for example, is an antidote to the human tendency to rely too much on a single piece of information or what is familiar — what psychologists call “anchoring bias.”
Big Data, its proponents insist, will be the next big trend in management. Erik Brynjolfsson, director of the MIT Center for Digital Business, cites the familiar business truism, “You can’t manage what you can’t measure.” And as it opens new horizons in measurement, the modern data era, Brynjolfsson said, will transform the practice of management. Big Data, he said, will “replace ideas, paradigms, organizations and ways of thinking about the world.”
Discrimination by statistical inference is a real risk in the Big Data world, as some personal data trails suggest a correlation that may be wrong. David Vladeck, a former senior Federal Trade Commission official and a professor of law at Georgetown University, offers this example: Imagine spending a few hours looking online for information on deep fat fryers. You could be looking for a gift for a friend or researching a report for cooking school. But to a data miner, tracking your online viewing, this hunt could be read as a telltale sign of an unhealthy habit — a data-based prediction that could make its way to a health insurer or potential employer.
And, again, the surveillance potential of Big Data technology, if it runs amok, is scary.
One glimpse of the potential payoff, however, can be seen at the Mount Sinai Medical Center, in the work being pursued by the group Hammerbacher has joined.
The 100-member team at the Icahn Institute for Genomics and Multiscale Biology is headed by Eric Schadt, a leading researcher in genomics and biomathematics. Schadt joined Mount Sinai less than two years ago, lured by ample financing and the promise that his group’s work would not be research in isolation but part of the medical center in treating patients.
The genomics revolution is on the cusp of realizing its promise, according to Schadt, thanks to the advancing technology of genetic sequencing and analysis. The government-financed Human Genome Project, completed in 2003, cost $2.7 billion. Today, whole human genome sequencing, identifying all 3 billion chemical units in the human genetic instruction set, can be done for $3,000. In three years, Schadt predicts, the cost will be less than $1,000, and in 5 to 10 years, less than $100, almost like a blood test today.
The technology makes it possible not only to observe life at the molecular level as never before, but also to explore how the minute ingredients of biology and the environment influence each other in individual humans — and personalize treatment. People with similar genetic traits, Schadt notes, often have very different health outcomes. Chronic ailments like cancer, heart disease and Alzheimer’s are not caused by single genes, he said, but are “complex, networked disorders.”
The Mount Sinai researchers, Schadt said, intend to combine genetic information with the medical histories — weight, age, gender, vital signs, tobacco use, toxic exposure and other data — to build more sophisticated models of biology and health outcomes. “We’re trying to move medicine in the direction of climatology and physics; disciplines that are far more advanced and mature quantitatively,” he said.
Schadt recruited Hammerbacher, an overture that coincided with Hammerbacher’s research into where next to best apply his skills. He describes his career as a matter of “following the smartest people to find the best problem.” Health care, in his view, is “the best problem by far,” where his talents could do the most good. At Mount Sinai, Hammerbacher said he hoped to learn a lot and assemble a small group of computing and data experts to help accelerate the genomic and medical research there.
Hammerbacher remains the chief scientist of data startup Cloudera and splits his time between San Francisco and Manhattan.
Hammerbacher has qualms about the Big Data realm he has helped create, including the surveillance potential of the technology. “What does it mean,” Hammerbacher pondered at one point, “to live in an era where things and people are infinitely observed?” And he appreciates that there is a lot of truth beyond data. “Just because you can’t measure it easily doesn’t mean it’s not important,” he observed.
While he is perhaps a qualified enthusiast, Hammerbacher is a data believer. He calls data the “intermediate representation of science.” The genome, he said, is “the quantification of the core of what we are.”
He says he thinks that medicine, and nearly every other field, will increasingly fall under the sway of what he calls “the numerical imagination,” which can be distilled in a question: “What is the story the data tells us?”