Investigating the algorithms that govern our lives – Columbia Journalism Review
Just an old-school style investigative look into technology, data, algorithms and humanity.
As online users, we’ve become accustomed to the giant, invisible hands of Google, Facebook, and Amazon feeding our screens. We’re surrounded by proprietary code like Twitter Trends, Google’s autocomplete, Netflix recommendations, and OKCupid matches. It’s how the internet churns. So when Instagram or Twitter, or the Silicon Valley titan of the moment, chooses to mess with what we consider our personal lives, we’re reminded where the power actually lies. And it rankles.
While internet users may be resigned to these algorithmic overlords, journalists can’t be. Algorithms have everything journalists are hardwired to question: They’re powerful, secret, and governing essential parts of society. Algorithms decide how fast Uber gets to you, whether you’re approved for a loan, whether a prisoner gets parole, who the police should monitor, and who the TSA should frisk.
Algorithms are built to approximate the world in a way that accommodates the purposes of their architect, and “embed a series of assumptions about how the world works and how the world should work,” says Hansen.
It’s up to journalists to investigate those assumptions, and their consequences, especially where they intersect with policy. The first step is extending classic journalism skills into a nascent domain: questioning systems of power, and employing experts to unpack what we don’t know. But when it comes to algorithms that can compute what the human mind can’t, that won’t be enough. Journalists who want to report on algorithms must expand their literacy into the areas of computing and data, in order to be equipped to deal with the ever-more-complex algorithms governing our lives.
The reporting so far
Few newsrooms consider algorithms a beat of their own, but some have already begun this type of reporting.
Algorithms can generally be broken down into three parts: the data that goes in; the “black box,” or the actual algorithmic process; and the outcome, or the value that gets spit out, be it a prediction or score or price. Reporting on algorithms can be done at any of the three stages, by analyzing the data that goes in, evaluating the data that comes out, or reviewing the architecture of the algorithm itself to see how it reaches its judgements.
Currently, the majority of reporting on algorithms is done by looking at the outcomes and attempting to reverse-engineer the algorithm, applying similar techniques as are used in data journalism. The Wall Street Journal used this technique to find that Staples’ online prices were determined by the customer’s distance from a competitor’s store, leaving prices higher in rural areas. And FiveThirtyEight used the method to skewer Fandango’s movie ratings—which skewed abnormally high, rarely dipping below 3 stars—while a ProPublica analysis suggested that Uber’s surge pricing increases cost but not the supply of drivers.
Can an algorithm be racist?
“Algorithms are like a very small child,” says Suresh Venkatasubramanian. “They learn from their environment.”
Venkatasubramanian is a computer science professor at the University of Utah. He’s someone who thinks about algorithmic fairness, ever since he read a short story by Cory Doctorow published in 2006, called “Human Readable.” The story takes place in a future world, similar to ours, but in which all national infrastructure (traffic, email, the media, etc.) is run by “centralized emergent networks,” modeled after ant colonies. Or in other words: a network of algorithms. The plot revolves around two lovers: a network engineer who is certain the system is incorruptible, and a lawyer who knows it’s already been corrupted.
“It got me thinking,” says Venkatasubramanian. “What happens if we live in a world that is totally driven by algorithms?”
He’s not the only one asking that question. Algorithmic accountability is a growing discipline across a number of fields. Computer scientists, legal scholars, and policy wonks are all grappling with ways to identify or prevent bias in algorithms, along with the best ways to establish standards for accountability in business and government. A big part of the concern is whether (and how) algorithms reinforce or amplify bias against minority groups.
Algorithmic accountability builds on the existing body of law and policy aimed at combatting discrimination in housing, employment, admissions, and the like, and applies the notion of disparate impact, which looks at the impact of a policy on protected classes rather than itsintention. What that means for algorithms is that it doesn’t have to be intentionally racist to have racist consequences.
Algorithms can be especially susceptible to perpetuating bias for two reasons. First, algorithms can encode human bias, whether intentionally or otherwise. This happens by using historical data or classifiers that reflect bias (such as labeling gay households separately, etc.). This is especially true for machine-learning algorithms that learn from users’ input. For example, researchers at Carnegie Mellon University found that women were receiving ads for lower-paying jobson Google’s ad network but weren’t sure why. It was possible, they wrote, that if more women tended to click on lower-paying ads, the algorithm would learn from that behavior, continuing the pattern.
Second, algorithms have some inherently unfair design tics—many of which are laid out in a Medium post, “How big data is unfair.” The author points out that since algorithms look for patterns, and minorities by definition don’t fit the same patterns as the majority, the results will be different for members of the minority group. And if the overall success rate of the algorithm is pretty high, it might not be noticeable that the people it isn’t working for all belong to a similar group.
To rectify this, Venkatasubramanian, along with several colleagues, wrote a paper on how computer scientists can test for bias mathematically while designing algorithms, the same way they’d check for accuracy or error rates in other data projects. He’s also building a tool for non-computer scientists, based on the same statistical principles, which scores uploaded data with a “fairness measure.” Although the tool can’t check if an algorithm itself is fair, it can at least make sure the data you’re feeding it is. Most algorithms learn from input data, Venkatasubramanian explains, so that’s the first place to check for bias.
Much of the reporting on algorithms thus far has focused on their impact on marginalized groups. ProPublica’s story on The Princeton Review, called “The Tiger-Mom Tax,” found that Asian families were almost twice as likely to be quoted the highest of three possible prices for an SAT tutoring course, and that income alone didn’t account for the pricing scheme. A team of journalism students at the University of Maryland, meanwhile, found that Uber wait times were longer in non-white areas in DC.
Bias is also the one of the biggest concerns with predictive policing software like PredPol, which helps police allocate resources by identifying patterns in past crime data and predicting where a crime is likely to happen. The major question, says Maurice Chammah, a journalist at The Marshall Project who reported on predictive policing, is whether it will just lead to more policing for minorities. “There was a worry that if you just took the data on arrests and put it into an algorithm,” he says, “the algorithm would keep sending you back to minority communities.”