Opinions Social media

The Toxicity Threshold

, by Rafael Jimenez Duran
On the one hand, platforms and their algorithms appear to accommodate the presence of hateful content in users' feeds; on the other hand, online platforms have moderated toxic content from the beginning, even before steep fines were introduced. Perhaps a profitable strategy for them lies in the middle

The pen is mightier than the sword, but it can be just as double-edged. And, just like a pen, social media can be both a tool for empowerment and a weapon for harm. The same technology that helped mobilize the Arab Spring protesters also hosts content that last year harassed one-third of American adults.

This dual role begs the question of how to preserve the beneficial uses of social media while minimizing the spread of hate speech, misinformation, and other harmful content. A starting point to answer this question is to follow the money and ask whether platforms have incentives to expose users to harmful content. As we will see, the answer to this question is not so straightforward.

Take the case of hate speech, harassment, and offensive speech-hereafter referred to as "toxic" content, for lack of a better term. On one hand, there are reasons to believe that profit-driven social media companies want to minimize this type of content. After all, one of the properties of toxic speech is that it undermines the public good of inclusion, because it is likely to reduce the willingness of some individuals to converse with others. And fewer online conversations mean less money for platforms. Besides, advertisers have boycotted social media companies, and regulators have fined them, over their failure to remove toxic speech.

On the other hand, industry insiders who know how social media algorithms work have voiced opposite concerns. Frances Haugen-Facebook's whistleblower-has put forward the view that platforms seeking to optimize engagement will prioritize hateful, polarizing content.

At first glance, the evidence seems on the side of insiders. An experiment asking social media users to install a browser extension that hid toxic content from their feeds reduced their engagement with the surviving content. This lower engagement translated into fewer ad impressions and clicks, likely lowering ad revenue. Moreover, users who were exposed to less toxic content also posted less toxic content subsequently, supporting the long-held hypothesis that toxicity is contagious. These pieces of evidence suggest the presence of a tradeoff for online platforms: they must tolerate a measure of decrease in user engagement to curtail the spread of toxicity.

But are platforms just fulfilling users' willingness to see toxicity on their feeds? Not necessarily. They may result from a perverse incentive partly driven by the popular advertising-based business model. In particular, optimizing for user engagement may be tangential to-or even at odds with-optimizing user wellbeing. Paradoxically, exposing users to toxic content on social media can harm them while simultaneously increasing the time they spend on the platform or the number of posts they consume. For example, users might dislike encountering offensive posts, but once they see them, they might wish to investigate more, read the comment section, and even participate in the discussion. An algorithm optimizing for engagement (to maximize advertising revenue) would respond to these signals by further exposing users to toxic content, even if users would rather not see it.

Based on this evidence, it may be tempting to conclude that platforms lack incentives to moderate content, removing too few posts or accounts that violate their terms of service. However, this conclusion would fail to explain why platforms have moderated content since their inception, before advertiser boycotts or regulatory fines. Just directing the Wayback Machine to in 1998 reveals that this ancestor of modern social media websites had terms of service that closely mimicked the rules of its descendants. And experimental and quasi-experimental evidence confirm that content moderation can increase user engagement.

How can these seemingly contradictory pieces of evidence coexist with one another? It may very well be-to be confirmed by future research-that a profitable strategy is to allow presence of toxic content and show users that some of it is being punished. After all, behavioral economics has shown that the demand for punishing wrongdoing is deeply rooted in our psyche.


Bocconi University
Department of Economics