Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Seth Stephens-Davidowitz HarperCollins (2017) Google Search has become the new confessional, where believers and infidels alike open their hearts and spill their guts to an omniscient silicon god. And Google Trends has become the new Revelation, where the truth about us stands exposed. That’s the thesis of this timely and entertaining book by economist and data scientist Seth Stephens-Davidowitz — who himself has worked for Google. For Google, Big Data (which he refuses to define precisely) emerges from the billions of questions we ask, and we are just beginning to grasp how to question Big Data itself to get the answers we need. “I am now convinced, Stephens-Davidowitz writes, “that Google searches are the most important dataset ever collected on the human psyche.” And why? Because “people are so honest in them.” More honest than people ever are with pollsters and market researchers, not to mention friends and family. The results of the U.S. 2016 presidential election bear him out. Polls showed Hillary Clinton well ahead of the impossible Donald Trump, right up until Nov. 8. But polls missed something that emerged from a post-election study of Google Trends: “Areas that supported Trump in the largest numbers were those that made the most searches for ‘nigger.’” What’s more, Americans make seven million searches a year for that word. Searches spiked right after the inauguration of Barack Obama, along with searches for Stormfront, a white nationalist website. Such findings should have alerted the general population that something ugly was afoot, years before 2016. Most Americans actually tilt a little left, according to Google Trends, and the news media tilt accordingly — not because their owners are left-wing, but because the owners make more money telling readers what they want to hear. In fact, Google Trends was exposing the smallness of the white racist vote: With 257 million internet-using Americans making billions of searches every day, seven million searches in a year are less than minuscule. Yet however small a minority they might be, and however scattered across the country, they made a critical difference in a few key states, giving Trump the presidency through the Electoral College while Clinton won millions more votes. Intelligence can be defined as putting what you know together with what you also know, and learning something you didn’t know. We can be confident politicians are now hiring every data scientist they can find to dig into the mood of the electorate as 2018 approaches (and 2019 here in Canada). The four powers of Big Data Political success, however, is only one outcome of Big Data. Stephens-Davidowitz argues Big Data has four “powers” that make it a very useful tool (and weapon) for all kinds of purposes. The first power, he says, is that it offers up new types of data, and not just from Google search results. Porn site traffic, for example, reveals a great deal about human sexuality. The second power is the honesty of the data. People may avoid using racial slurs in normal conversation, but not in searches. They may be discreet about their sexual preferences in public, but not when asking “Am I gay?” “Allowing us to zoom in on small subsets of people is the third power of Big Data,” says Stephens-Davidowitz. He cites Dr. John Snow, the 19th-century London physician who mapped the homes of cholera victims during an outbreak and traced their cases to a single contaminated water pump. Today, on a much larger scale, we can now zoom in on “doppelgängers,” people very similar to one another in the books they buy or the diseases they contract. The fourth power of Big Data is doing “causal” experiments. Early on, websites worried about how to cause traffic: how to make readers land on a site, stay on it, and return often. The solution: “A/B testing,” in which some viewers got one version of the website and others got a slightly different version: a different colour, a different font, and so on. The more attractive versions then got tested as well. Google has been doing A/B testing since 2000, Stephens-Davidowitz says, and is now doing a thousand tests a day for many purposes. One promising area: offering different lesson plans to teachers to see which plan results in the most effective student learning. Big Data is more than just counting searches. Words are data, and we use them in different ways. Men’s vocabulary is strikingly different from women’s, and Democrats’ vocabulary (“gays,” “needs of poor people”) is different from Republicans’ (“homosexuals,” “government spending”). The words may reflect bias, but we reveal ourselves when we use them. Refuting personal ideology Big Data is most convincing when it confirms our own view, but Stephens-Davidowitz warns us: “We tend to exaggerate the relevance of our own experience… we weight our data.” More likely, Big Data may refute our personal experience and therefore our political ideology. For example, the American Dream tells us that social mobility is high and the children of poor parents may grow up to be rich. Big Data says just 7.5 per cent of such kids do so in the U.S., compared to 13.5 per cent in Canada. But those are averages. In San Jose, California, a poor kid has a 12.9 per cent chance of rising in the world. As for achieving enough success to warrant a Wikipedia page, Big Data points the way: Grow up in a county with a college town where you’ll be exposed to innovation, and a big city where you can show your stuff. What’s more, says Stephens-Davidowitz, “The greater the percentage of foreign-born residents in an area, the higher the proportion of children born there who go on to notable success.” Strikingly, Big Data shows education spending does not correlate with numbers of notable writers, artists, and business people. But it does correlate with children who grow up to reach the middle class. Skip the Ivy League Big Data also shows that rich, anxious parents who want to send their kids to the best possible school are wasting their money: “There is growing evidence that, while going to a good school is important, there is little gained from going to the greatest possible school.” A bright, ambitious student will earn just as much money whether he or she graduates from Harvard or Simon Fraser. Stephens-Davidowitz cautions that the power of Big Data may intimidate some employers into rejecting applicants because they look like doppelgängers of failures. Pushed too far, Big Data can lead to support for carding — police stopping minorities for a check just because they’re minorities — or for companies that won’t hire women because some women get pregnant. Still, for those of us with a sentimental attachment to evidence-based policy, Big Data can give us some useful insights into our world and ourselves. We won’t always like what we hear, but we can challenge those findings with better ones, based on asking smarter questions of new types of data. And we’d better start now; Google and many other big corporations are already years ahead of us, and they’ve already taken our measure.