On Countering Bias—in People and Algorithms
Mia Quagliarello / April 26, 2018
No system is foolproof but there are things that can be done to help
The process of producing, delivering and presenting articles to readers has always involved more people than just journalists. To scale, it has to.
At Flipboard, we deliver well over 100,000 stories per day—far too many for even an army of editors to process, label and present in a way that stays true to our values, never mind in a way that’s personalized to each individual.
This is where algorithms come in. They are informed by editors, yes, but the truth is that they are informed by anyone who touches them. Algorithms are architected by people, and anytime you have people, you’re going to have bias.
From edit to engineering, we are hyper-aware of this issue. It’s not enough to have journalistic principles that you abide by. Bias must be acknowledged and treated as its own problem. Here are some ways we think about countering bias when building our algorithms:
- Ranking sources. “The truth” is relative and it would be impossible to overcome bias in trying to define it. Instead, we think about how we can use technology to create a more controlled environment that is optimized for the truth while still allowing for many perspectives to shine through. To do this, we pay particularly close attention to the domains and publishers on Flipboard. A team of humans determines the editorial quality of a source (here’s our definition of fake news), and then something called the domain ranker comes into play. Built for spam detection, the ranker allows the team to favor sources with known track records, who themselves follow time-honored journalistic principles. Who’s ranked and how is carefully guarded and continually reviewed.
- Incorporating signal from as many people as possible. While the ranker does make it harder for stories from the longtail to surface on Flipboard, there’s another filter that influences what you see: the user satisfaction score. An amalgam of signals that indicate how engaged people are with a piece of content, the score is a proxy for quality. Any item can be flipped into the Flipboard ecosystem, but then it goes before a jury of readers and curators who decide its fate with their fingers. Articles with higher satisfaction scores tend to get surfaced more often and in more places on Flipboard than those stories with low scores.
- Acknowledging bias in datasets. Quite often a dataset is already skewed toward the point of view of a majority of a subset of users or just society (especially U.S. society) in general. On Flipboard, a lot of the content we process comes from people flipping articles into their magazines—and so if there are imbalances of gender or viewpoint, for example, you’re going to see that reflected in content trends across the platform. This can create a positive feedback loop, as the decisions of “the group” can, and do, overwhelm the minority.
- Clustering stories for multiple perspectives. As the (independent) home to thousands of high-quality publishers from around the world, it’s critical that we surface the plurality of sources and voices you’ll find here. Story clustering is an algorithmic technique we use to pull together stories from different sources on the same topic. Not every cluster might actually have stories with truly unique viewpoints—machine learning just isn’t there yet—but the structure gives us a framework to offer balance.
- Attributing for context. All stories on Flipboard have author, publisher and/or curator attribution so you can see where it comes from and make your own informed decisions about the person’s inherent biases.
- Hiring for diversity. There are approximately 14,000 algorithmically derived topics in Flipboard; we cannot possibly check how effective our code has been in generating all of those feeds, so it’s important that we check a cross-section of those feeds and then look at them from a variety of perspectives. This can be a challenge in a world where engineers are overwhelmingly male. Not only do we have to account for gender bias, but our product is also global and we are starting to apply our algorithms to languages other than English. We need to be certain there is oversight from different people that reflect multiple perspectives. It’s important we do our best to get as many perspectives as possible on as wide a variety of topics we can.