Discerning Truth in the Age of Ubiquitous Disinformation
Initial Reflection on My Evidence to the DCMS Enquiry on Fake News
- Post-truth politics: The first societal and political challenge comes from the emergence of post-truth politics, where politicians, parties, and governments tend to frame key political issues in propaganda, instead of facts. Misleading claims are continuously repeated, even when proven untrue through fact-checking by media or independent experts (e.g. the VoteLeave claim that Britain was paying the EU £350 million a week). This has a highly corrosive effect on public trust.
- Online propaganda and fake news: State-backed (e.g. Russia Today), ideology-driven (e.g. misogynistic or Islamophobic), and clickbait websites and social media accounts are all engaged in spreading misinformation, often with the intent to deepen social division and/or influence key political outcomes (e.g. the 2016 US presidential election).
- Partisan media: The pressures of the 24-hour news cycle and today’s highly competitive online media landscape have resulted in lower reporting quality and opinion diversity, with misinformation, bias, and factual inaccuracies routinely creeping in.
- Polarised crowds: As more and more citizens turn to online sources as their primary source of news, the social media platforms and their advertising and content recommendation algorithms have enabled the creation of partisan camps and polarised crowds, characterised by flame wars and biased content sharing, which in turn, reinforces their prior beliefs (typically referred to as confirmation bias).
- Non-trivial scalability: While some of our algorithms work in near real time on specific datasets (e.g. tweets about the Brexit referendum), applying them across all posts on all topics as Twitter would need to do, for example, is very far from trivial. Just to give a sense of the scale here - prior to 23 June 2016 (referendum day) we had to process fewer than 50 Brexit-related tweets per second, which was doable. Twitter, however, would need to process more than 6,000 tweets per second which is a serious software engineering, computational, and algorithmic challenge.
- Algorithms make mistakes, so while 90% accuracy intuitively sounds very promising, we must not forget the errors - 10% in this case, or double that at 80% algorithm accuracy. On 6,000 tweets per second this 10% amounts to 600 wrongly labeled tweets per second rising to 1,200 for the lower accuracy algorithm. To make matters worse, automatic disinformation analysis often combines more than one algorithm - first to determine which story a post refers to and second - whether this is likely true, false, or uncertain. Unfortunately, when algorithms are executed in a sequence, errors have a cumulative effect.
- These mistakes can be very costly: broadly speaking algorithms make two kinds of errors - false negatives (e.g.. disinformation is wrongly labelled as true or bot accounts wrongly identified as human) and false positives (e.g. correct information is wrongly labelled as disinformation or genuine users being wrongly identified as bots). False negatives are a problem on social platforms, because the high volume and velocity of social posts (e.g. 6,000 tweets per second on average) still leaves us with a lot of disinformation “in the wild”. If we draw an analogy with email spam - even though most of it is filtered out automatically, we are still receiving a significant proportion of spam messages. False positives, on the other hand, pose an even more significant problem, as they could be regarded as censorship. Facebook, for example, has a growing problem with some users having their accounts wrongly suspended.