Tuesday, 14 January 2014

PHEME: A new project on computing the veracity of social media content

The London Eye was on fire during the 2011 England riots! Or was it? Social networks are rife with lies and deception, half-truths and facts. But irrespective of a meme's truthfulness, the rapid spread of such information through social networks and other online media can have immediate and far-reaching consequences. In such cases, large amounts of user-generated content need to be analysed quickly, yet it is not currently possible to carry out such complex analyses in real time.

In the past week I've been very excited (and rather busy) by the starting of a new European project, called PHEME ( PhemeEU).  The aim is to develop automatic methods to help people (e.g. journalists, health professionals, patients, government services) assess the truthfulness of information that is spreading through social networks and other online media.

With partners from seven different countries, the project will combine big data analytics with advanced linguistic and visual methods. The results will be suitable for direct application in medical information systems and digital journalism.

Veracity: The Fourth Challenge of Big Data

Social media poses three major computational challenges, dubbed by Gartner the 3Vs of big data: volume, velocity, and variety.

PHEME will focus on a fourth crucial, but hitherto largely unstudied, challenge: veracity

While writing the proposal, I coined the term phemes to describe memes which are enhanced with truthfulness information. It is a reference also to Pheme - the Greek goddess of fame and rumours.

Identifying Phemes (Rumorous Memes) 

We are concentrating on identifying four types of phemes and modelling their spread across social networks and online media: speculation, controversy, misinformation, and disinformation. However, it is particularly difficult to assess whether a piece of information falls into one of these categories in the context of social media. The quality of the information here is highly dependent on its social context and, up to now, it has proven very challenging to identify and interpret this context automatically.

An Interdisciplinary Approach

PHEME has partners from the fields of natural language processing and text mining, web science, social network analysis, and information visualization. Together, we will use three factors to analyse veracity: first, the information inherent in a document itself – that is lexical, semantic and syntactic information. This is then cross-referenced with data sources that are assessed as particularly trustworthy, for example in the case of medical information, PubMed, the biggest online database in the world for original medical publications. Finally, the diffusion of a piece of information is analysed – who receives what information and from whom, and whether and to whom they pass it on? 

 "Rumor intelligence", that is the ability to identify rumours in good time will be tested, inter alia, in the area of medical information systems. For digital journalism, results will be tested with  swissinfo.ch (the international service of the Swiss Broadcasting Corporation (SBC)) and Ushahidi's SwiftRiver media filtering and verification platform. The new technology will help journalists assesss  the veracity of user-generated content – an activity that is largely carried out manually to date, requiring significant resources. Other news organisations who have expressed support the project are the BBC, the Guardian, and the German regional broadcasting corporation Südwestrundfunk. 

So this is all going to be great - identifying rumours across social media and helping filter out the misinformation. Keep up with our progress - follow PHEME on Twitter!