In the past week I've been very excited (and rather busy) by the starting of a new European project, called PHEME ( PhemeEU). The aim is to develop automatic methods to help people (e.g. journalists, health professionals, patients, government services) assess the truthfulness of information that is spreading through social networks and other online media.
With partners from seven different countries, the project will combine big data analytics with advanced linguistic and visual methods. The results will be suitable for direct application in medical information systems and digital journalism.
Veracity: The Fourth Challenge of Big DataSocial media poses three major computational challenges, dubbed by Gartner the 3Vs of big data: volume, velocity, and variety.
PHEME will focus on a fourth crucial, but hitherto largely unstudied, challenge: veracity.
While writing the proposal, I coined the term phemes to describe memes which are enhanced with truthfulness information. It is a reference also to Pheme - the Greek goddess of fame and rumours.
Identifying Phemes (Rumorous Memes)We are concentrating on identifying four types of phemes and modelling their spread across social networks and online media: speculation, controversy, misinformation, and disinformation. However, it is particularly difficult to assess whether a piece of information falls into one of these categories in the context of social media. The quality of the information here is highly dependent on its social context and, up to now, it has proven very challenging to identify and interpret this context automatically.
An Interdisciplinary Approach
PHEME has partners from the fields of natural language processing and text mining, web science, social network analysis, and information visualization. Together, we will use three factors to analyse veracity: first, the information inherent in a document itself – that is lexical, semantic and syntactic information. This is then cross-referenced with data sources that are assessed as particularly trustworthy, for example in the case of medical information, PubMed, the biggest online database in the world for original medical publications. Finally, the diffusion of a piece of information is analysed – who receives what information and from whom, and whether and to whom they pass it on?
"Rumor intelligence", that is the ability to identify rumours in good time will be tested, inter alia, in the area of medical information systems. For digital journalism, results will be tested with swissinfo.ch (the international service of the Swiss Broadcasting Corporation (SBC)) and Ushahidi's SwiftRiver media filtering and verification platform. The new technology will help journalists assesss the veracity of user-generated content – an activity that is largely carried out manually to date, requiring significant resources. Other news organisations who have expressed support the project are the BBC, the Guardian, and the German regional broadcasting corporation Südwestrundfunk.
So this is all going to be great - identifying rumours across social media and helping filter out the misinformation. Keep up with our progress - follow PHEME on Twitter!