Showing posts with label Misinformation. Show all posts
Showing posts with label Misinformation. Show all posts

Tuesday, 5 March 2019

Brexit--The Regional Divide

Although the UK voted by a narrow margin in the UK EU membership referendum in 2016 to leave the EU, that outcome failed to capture the diverse feelings held in various regions. It's a curious observation that the UK regions with the most economic dependence on the EU were the regions more likely to vote to leave it. The image below on the right is taken from this article from the Centre for European Reform, and makes the point in a few different ways. This and similar research inspired a current project the GATE team are undertaking with colleagues in the Geography and Journalism departments at Sheffield University, under the leadership of Miguel Kanai and with funding from the British Academy, aiming to understand whether lack of awareness of individual local situation played a role in the referendum outcome.
Our Brexit tweet corpus contains tweets collected during the run-up to the Brexit referendum, and we've annotated almost half a million accounts for Brexit vote intent with a high accuracy. You can read about that here. So we thought we'd be well positioned to bring some insights. We also annotated user accounts with location: many Twitter users volunteer that information, though there can be a lot of variation on how people describe their location, so that was harder to do accurately. We also used local and national news media corpora from the time of the referendum, in order to contrast national coverage with local issues are around the country.
"People's resistance to propaganda and media‐promoted ideas derives from their close ties in real communities"
Jean Seaton
Using topic modelling and named entity recognition, we were able to look for similarities and differences in the focus of local and national media and Twitter users. The bar chart on the left gets us started, illustrating that foci differ between media. Twitter users give more air time than news media to trade and immigration, whereas local press takes the lead on employment, local politics and agriculture. National press gives more space to terrorism than either Twitter or local news.
On the right is just one of many graphs in which we unpack this on a region-by-region basis (you can find more on the project website). In this choropleth, red indicates that the topic was significantly more discussed in national press than in local press in that area, and green indicates that the topic was significantly more discussed in local press there than in national press. Terrorism and immigration have perhaps been subject to a certain degree of media and propaganda inflation--we talk about this in our Social Informatics paper. Where media focus on locally relevant issues, foci are more grounded, for example in practical topics such as agriculture and employment. We found that across the regions, Twitter remainers showed a closer congruence with local press than Twitter leavers.
The graph on the right shows the number of times a newspaper was linked on Twitter, contrasted against the percentage of people that said they read that newspaper in the British Election Study. It shows that the dynamics of popularity on Twitter are very different to traditional readership. This highlights a need to understand how the online environment is affecting the news reportage we are exposed to, creating a market for a different kind of material, and a potentially more hostile climate for quality journalism, as discussed by project advisor Prof. Jackie Harrison here. Furthermore, local press are increasingly struggling to survive, so it feels important to highlight their value through this work.
You can see more choropleths on the project website. There's also an extended version here of an article currently under review.

Monday, 18 February 2019

Russian Troll Factory: Sketches of a Propaganda Campaign

When Twitter shared a large archive of propaganda tweets late in 2018 we were excited to get access to over 9 million tweets from almost 4 thousand unique Twitter accounts controlled by Russia's Internet Research Agency. The tweets are posted in 57 different languages, but most are in Russian (53.68%) and English (36.08%). Average account age is around four years, and the longest accounts are as much as ten years old.
A large amount of activity in both the English and Russian accounts is given to news provision. Secondly, many accounts seem to engage in hashtag games, which may be a way to establish an account and get some followers. Of particular interest however are the political trolls. Left trolls pose as individuals interested in the Black Lives Matter campaign. Right trolls are patriotic, anti-immigration Trump supporters. Among left and right trolls, several have achieved large follower numbers and even a degree of fame. Finally there are fearmonger trolls, that propagate scares, and a small number of commercial trolls. The Russian language accounts also divide on similar lines, perhaps posing as individuals with opinions about Ukraine or western politics. These categories were proposed by Darren Linvill and Patrick Warren, from Clemson University. In the word clouds below you can see the hashtags we found left and right trolls using.

Left Troll Hashtags

Right Troll Hashtags
Mehmet E. Bakir has created some interactive graphs enabling us to explore the data. In the network diagram at the start of the post you can see the network of mention/retweet/reply/quote counts we created from the highly followed accounts in the set. You can click through to an interactive version, where you can zoom in and explore different troll types.
In the graph below, you can see activity in different languages over time (interactive version here, or interact with the embedded version below; you may have to scroll right). It shows that the Russian language operation came first, with English language operations following after. The timing of this part of the activity coincides with Russia's interest in Ukraine.

In the graph below, also available here, you can see how different types of behavioural strategy pay off in terms of achieving higher numbers of retweets. Using Linvill and Warren's manually annotated data, Mehmet built a classifier that enabled us to classify all the accounts in the dataset. It is evident that the political trolls have by far the greatest impact in terms of retweets achieved, with left trolls being the most successful. Russia's interest in the Black Lives Matter campaign perhaps suggests that the first challenge for agents is to win a following, and that exploiting divisions in society is an effective way to do that. How that following is then used to influence minds is a separate question. You can see a pre-print of our paper describing our work so far, in the context of the broader picture of partisanship, propaganda and post-truth politics, here.

Friday, 8 February 2019

3rd International Workshop on Rumours and Deception in Social Media (RDSM)

June 11, 2019 in Munich, Germany
Collocated with ICWSM'2019

Abstract

The 3rd edition of the RDSM workshop will particularly focus on online information disorder and its interplay with public opinion formation.

Social media is a valuable resource for mining all kind of information varying from opinions to factual information. However, social media houses issues that are serious threats to the society. Online information disorder and its power on shaping public opinion lead the category of those issues. Among the known aspects are the spread of false rumours, fake news or even social attacks such as hate speech or other forms of harmful social posts. In this workshop the aim is to bring together researchers and practitioners interested in social media mining and analysis to deal with the emerging issues of information disorder and manipulation of public opinion. The focus of the workshop will be on themes such as the detection of fake news, verification of rumours and the understanding of their impact on public opinion.  Furthermore, we aim to put a great emphasis on the usefulness and trust aspects of automated solutions tackling the aforementioned themes.

Workshop Theme and Topics

The aim of this workshop is to bring together researchers and practitioners interested in social media mining and analysis to deal with the emerging issues of veracity assessment, fake news detection and manipulation of public opinion. We invite researchers and practitioners to submit papers reporting results on these issues. Qualitative studies performing user studies on the challenges encountered with the use of social media, such as the veracity of information and fake news detection, as well as papers reporting new data sets are also welcome. Finally, we also welcome studies reporting the usefulness and trust of social media tools tackling the aforementioned problems.


Topics of interest include, but are not limited to:

  • Detection and tracking of rumours.
  • Rumour veracity classification.
  • Fact-checking social media.
  • Detection and analysis of disinformation, hoaxes and fake news.
  • Stance detection in social media.
  • Qualitative user studies assessing the use of social media.
  • Bots detection in social media.
  • Measuring public opinion through social media.
  • Assessing the impact of social media in public opinion.
  • Political analyses of social media.
  • Real-time social media mining.
  • NLP for social media analysis.
  • Network analysis and diffusion of dis/misinformation.
  • Usefulness and trust analysis of social media tools.
  • AI generated fake content (image / text)

Workshop Program Format


We will have 1-2 experts in the field delivering keynote speeches. We will then have a set of 8-10 presentations of peer-reviewed submissions, organised into 3 sessions by subject (the first two sessions about online information disorder and public opinion and the third session about the usefulness and trust aspects). After the session we also plan to have a group work (groups of size 4-5 attendances) where each group will sketch a social media tool for tackling e.g. rumour verification, fake news detection, etc. The emphasis of the sketch should be on aspects like usefulness and trust. This should take no longer than 120 minutes (sketching, presentation/discussion time).  We will close the workshop with a summary and take home messages (max. 15 minutes). Attendance will be open to all interested participants.

We welcome both full papers (5-8 pages) to be presented as oral talks and short papers (2-4 pages) to be presented as posters and demos.


Workshop Schedule/Important Dates
  • Submission deadline: April 1st 2019
  • Notification of Acceptance: April 15th 2019
  • Camera-Ready Versions Due: April 26th 2019
  • Workshop date: June 11, 2019  

 

Submission Procedure


We invite two kinds of submissions:

-  Long papers/Brief Research Report (max 8 pages + 2 references)
-  Demos and poster (short papers) (max 4 pages + 2 references)

Proceedings of the workshop will be published jointly with other ICWSM workshops in a special 
issue of Frontiers in Big Data.


Papers must be submitted electronically in PDF format or any format that is supported by the 
submission site through https://www.frontiersin.org/research-topics/9706 (click on "Submit your manuscript"). 
Note, submitting authors should choose one of the specific track organizers as their preferred Editor.

You can find detailed information on the file submission requirements here:
https://www.frontiersin.org/about/author-guidelines#FileRequirements

Submissions will be peer-reviewed by at least three members of the programme
committee. The accepted papers will appear in the proceedings published at 
 https://www.frontiersin.org/research-topics/9706



Workshop Organizers

Programme Committee (Tentative)

  • Nikolas Aletras, University of Sheffield, UK
  • Emilio Ferrara, University of Southern California, USA
  • Bahareh Heravi, University College Dublin, Ireland
  • Petya Osenova, Ontotext, Bulgaria
  • Damiano Spina, RMIT University, Australia
  • Peter Tolmie, Universität Siegen, Germany
  • Marcos Zampieri, University of Wolverhampton, UK
  • Milad Mirbabaie, University of Duisburg-Essen, Germany
  • Tobias Hecking, University of Duisburg-Essen, Germany 
  • Kareem Darwish, QCRI, Qatar
  • Hassan Sajjad, QCRI, Qatar
  • Sumithra Velupillai, King's College London, UK

 

Invited Speaker(s)

To be announced

Sponsors

This workshop is  supported by the European Union under grant agreement No. 654024, SoBigData.
 


And the EU co-funded horizon 2020 project that deals with algorithm-supported verification of digital content


WeVerify

Tuesday, 24 April 2018

Funded PhD Opportunity: Large Scale Analysis of Online Disinformation in Political Debates

Applications are invited for an EPSRC-funded studentship at The University of Sheffield commencing on 1 October 2018.

The PhD project will examine the intersection of online political debates and misinformation, through big data analysis. This research is very timely, because online mis- and disinformation is reinforcing the formation of polarised partisan camps, sharing biased, self-reinforcing content. This is coupled with the rise in post-truth politics, where key arguments are repeated continuously, even when proven untrue by journalists or independent experts. Journalists and media have tried to counter this through fact-checking initiatives, but these are currently mostly manual, and thus not scalable to big data.


The aim is to develop machine learning-based methods for large-scale analysis of online misinformation and its role in political debates on online social platforms.



Application deadline: as soon as possible, until the funding is filled  
Interviews: interviews take place within 2-3 weeks of application

Supervisory team: Professor Kalina Bontcheva (Department of Computer Science, University of Sheffield), Professor Piers Robinson (Department of Journalism, University of Sheffield), and Dr. Nikolaos Aletras (Information School, University of Sheffield).


Award Details

The studentship will cover tuition fees at the EU/UK rate and provide an annual maintenance stipend at standard Research Council rates (£14,777 in 2018/19) for 3.5 years.

Eligibility

The general eligibility requirements are:
  • Applicants should normally have studied in a relevant field to a very good standard at MSc level or equivalent experience.
  • Applicants should also have a 2.1 in a BSc degree, or equivalent qualification, in a related discipline.
  • ESRPC studentships are only available to students from the UK or European Union. Applications cannot be accepted from students liable to pay fees at the Overseas rate. Normally UK students will be eligible for a full award which pays fees and a maintenance grant if they meet the residency criteria and EU students will be eligible for a fees-only award, unless they have been resident in the UK for 3 years immediately prior to taking up the award.

How to apply

To apply for the studentship, applicants need to apply directly to the University of Sheffield for entrance into the doctoral programme in Computer Science 


  • Complete an application for admission to the standard computer science PhD programme http://www.sheffield.ac.uk/postgraduate/research/apply 
  • Applications should include a research proposal; CV; academic writing sample; transcripts and two references.
  • The research proposal of up to 1,000 words should outline your reasons for applying to this project and how you would approach the research including details of your skills and experience in both computing and/or data journalism.
  • Supporting documents should be uploaded to your application.

Thursday, 8 March 2018

Discerning Truth in the Age of Ubiquitous Disinformation (2)

How Can We Combat Online Disinformation



Kalina Bontcheva (@kbontcheva)

In my previous blog post I wrote about the 4Ps of the modern disinformation age: post-truth politics, online propaganda, polarised crowds,  and partisan media. 

Now, let me reflect some more on the question of what can we do about it. Please note that this is not an exhaustive list!
Promote Collaborative Fact Checking Efforts
In order to counter subjectivity, post-truth politics, disinformation, and propaganda, many media and non-partisan institutions worldwide have started fact checking initiatives – 114 in total, according to Poynter. These mostly focus on exposing disinformation in political discourse, but generally aim at encouraging people to pursue accuracy and veracity of information (e.g. Politifact, FullFact.org, Snopes). A study by the American Press Institute has shown that even politically literate consumers benefit from fact-checking as they increase their knowledge of the subject.
Professional fact checking is a time-consuming process that cannot cover a significant proportion of the claims being propagated via social media channels. To date, most projects have been limited to one or two steps of the fact checking process, or are specialized on certain subject domains: Claimbuster, ContentCheck and the ongoing Fake News Challenge are a few examples.
There are two ways to lower the overheads and I believe both are worth pursuing: 1) create a coordinated fact-checking initiative that promotes collaboration between different media organisations, journalists, and NGOs; 2) fund the creation of automation tools for analysing disinformation, to help the human effort.  I discuss the latter in more detail next.

Fund open-source research on automatic methods for disinformation detection
In the PHEME research project we focused specifically on studying rumours associated with different types of events—some were events like shootings and others were rumours and hoax stories like “Prince is going to have a concert in Toronto”—and how those stories were disseminated via Twitter or Reddit. We looked at how reliably we can identify such rumours: one of the hardest tasks is how to group all the different social media posts like tweets or Reddit posts around the same rumour together. In Reddit it is a bit easier thanks to threads. Twitter is harder because often there are multiple originating tweets that refer to the same rumour.

That is the real challenge: to piece together all these stories, because the ability to identify whether something is correct or not depends a lot on evidence and also on the discussions around that rumour, that the public are carrying out on social media platforms. By seeing one or two tweets, sometimes even journalists cannot be certain whether a rumour is true or false, but as we see the discussion around the rumours and the accumulating evidence over time, the judgment becomes more reliable.

Consequently,  it becomes easier to predict the veracity of a rumour, but the main challenge is identifying reliably all these different tweets that are talking about the same rumour. If sufficient evidence can be provided across different tweet posts, it becomes possible to determine the veracity of that rumour with around 85% accuracy.

In the wider context, there is emerging technology for veracity checking and verification of social media content (going beyond images/video forensics). These include tools developed in several European projects (e.g. PHEME, REVEAL, and InVID), tools assisting crowdsourced verification (e.g. CheckDesk, Veri.ly), citizen journalism (e.g. Citizen Desk), and repositories of checked facts/rumours (e.g. Emergent, FactCheck). However, many of those tools are language specific and would thus need adaptation and enhancement to new languages. Besides, further improvements are needed to the algorithms themselves, in order to achieve accuracy comparable to that of email spam filter technology. 

It is also important to invest in establishing ethical protocols and research methodologies, since social media content raises a number of privacy, ethical, and legal challenges. 

Dangers and pitfalls of relying purely on automated tools for disinformation detection
Many researchers (myself included) are working on automated methods based on machine learning algorithms, in order to identify automatically disinformation on social media platforms. Given the extremely large volume of  social media posts, key questions are can disinformation be identified in real time and should such methods be adopted by the social media platforms themselves?
The very short answer is: Yes, in principle, but we are still far from solving many key socio-technical issues, so, when it comes to containing the spread of disinformation, we should be mindful of the problems which such technology could introduce:
l      Non-trivial scalability: While some of our algorithms work in near real time on specific datasets such as tweets about the Brexit referendum - applying them across all posts on all topics as Twitter would need to do, for example, is very far from trivial. Just to give a sense of the scale here - prior to 23 June 2016 (referendum day) we had to process fewer than 50 Brexit-related tweets per second, which was doable. Twitter, however, would need to process more than 6,000 tweets per second, which is a serious software engineering, computational, and algorithmic challenge.
l      Algorithms make mistakes, so while 90 per cent accuracy intuitively sounds very promising, we must not forget the errors - 10 per cent in this case, or double that at 80 per cent algorithm accuracy. On 6,000 tweets per second this 10 per cent amounts to 600 wrongly labeled tweets per second rising to 1,200 for the lower accuracy algorithm. To make matters worse, automatic disinformation analysis often combines more than one algorithm - first to determine which story a post refers to and second - whether this is likely true, false, or uncertain. Unfortunately, when algorithms are executed in a sequence, errors have a cumulative effect.
l      These mistakes can be very costly: broadly speaking algorithms make two kinds of errors - false negatives in which disinformation is wrongly labelled as true or bot accounts wrongly identified as human and false positives, correct information is wrongly labelled as disinformation or genuine users being wrongly identified as bots. False negatives are a problem on social platforms, because the high volume and velocity of social posts (e.g. 6,000 tweets per second on average) still leaves  with a lot of disinformation “in the wild”. If we draw an analogy with email spam - even though most of it is filtered out automatically, we are still receiving a significant proportion of spam messages. False positives, on the other hand, pose an even more significant problem, as falsely removing genuine messages is effectively censorship through artificial intelligence. Facebook, for example, has a growing problem with some users having their accounts wrongly suspended.
Therefore, I strongly believe that the best way forward is to implement human-in-the-loop solutions, where people are assisted by machine learning and AI methods, but not replaced entirely, as accuracy is still not high enough, but primarily, for the censorship danger.

Establishing Cooperation and Data Exchange between Social Platforms and Scientists
Our latest work on analysing misinformation in tweets about the UK referendum [1] [2]  showed yet again a very important issue - when it comes to social media and furthering our ability to understand its misuse and impact on society and democracy, the only way forward is for data scientists, political and social scientists and journalists to work together alongside the big social media platforms and policy makers. I believe data scientists and journalists need to be given open access to the full set of public social media posts on key political events for research purposes (without compromising privacy and data protection laws), and be able to work in collaboration with the platforms through grants and shared funding (such as the Google Digital News Initiative).

There are still many outstanding questions that need to be researched - most notably the dynamics of the interaction between all these Twitter accounts over time - for which we need the complete archive of public tweets, images, and URL content shared, as well as profile data and friend/follower networks. This would help us quantify better (amongst other things) what kinds of tweets and messages resulted in misinformation spreading accounts gaining followers and re-tweets, how human-like was the behaviour of the successful ones, and also were they connected to the alternative media ecosystem and how.

The intersection of automated accounts, political propaganda, and misinformation is a key area in need of further investigation, but for which, scientists often lack the much needed data, while the data keepers lack the necessary transparency, motivation to investigate these issues, and willingness to create open and unbiased algorithms.

Policy Decisions around Preserving Important Social Media Content for Future Studies
Governments and policy makers are in a position to help establish this much needed cooperation between social platforms and scientists, promote the definition of policies for ethical, privacy-preserving research and data analytics over social media data, and also ensure the archiving and preservation of social media content of key historical value.
For instance, given the ongoing debate on the scale and influence of Russian propaganda on election and referenda outcomes, it would have been invaluable to have Twitter archives made available to researchers under strict access and code of practice criteria, so it becomes possible to study these questions in more depth. Unfortunately, this is not currently possible, with Twitter having suspended all Russia-linked accounts and bots, as well as all their content and social network information. Similar issues arise when trying to study online abuse of and from politicians, as posts and accounts are again suspended or deleted at a very high rate.
Related to this is the challenge of open and repeatable science on social media data, as again many of the posts in current datasets available for training and evaluating machine learning algorithms, have been deleted or are not available. This causes a problem as algorithms do not have sufficient data to improve as a result and neither can scientists determine easily whether a new method is really outperforming the state-of-the-art.
Promoting Media Literacy and Critical Thinking for Citizens

According to the Media Literacy project: “Media literacy is the ability to access, analyze, evaluate, and create media. Media literate youth and adults are better able to understand the complex messages we receive from television, radio, Internet, newspapers, magazines, books, billboards, video games, music, and all other forms of media.”

Training citizens in the ability to recognise spin, bias, and mis- and disinformation are key elements. Due to the extensive online and social media exposure of children, there are also initiatives aimed specifically at school children, starting from as young as 11 years old. There are also online educational resources on media literacy and fake news [3], [4] that could act as a useful starting point of national media literacy initiatives.

Increasingly, media literacy and critical thinking are seen as key tools in fighting the effects of online disinformation and propaganda techniques [5], [6]. Many of the existing programmes today are delivered by NGOs in a face-to-face group setting. The next challenge is how to roll these out at scale and also online, in order to reach wide audience across all social and age groups.

Establish/revise and enforce national code of practice for politicians and media outlets

Disinformation and biased content reporting are not just the preserve of fake news and state-driven propaganda sites  and social accounts. A significant amount also comes from partisan media and factually incorrect statements by prominent politicians.

In the case of the UK EU membership referendum, for example, a false claim regarding immigrants from Turkey was made on the front pages of a major UK newspaper [7], [8]. Another widely known and influential example was VoteLeave’s false claim that the EU costs £350 million a week [9]. Even though the UK Office of National Statistics disputed the accuracy of this claim on 21 April 2016 (2 months prior to the referendum), it continued to be used throughout the campaign.

Therefore, an effective way to combat deliberate online falsehoods must address such cases as well. Governments and policy makers could help again through establishing new or updating existing codes of practice of political parties and press standards, as well as ensuring that they are adhered to. 

These need to be supplemented with transparency in political advertising on social platforms, in order to eliminate  or significantly reduce promotion of misinformation through advertising. These measures would also help with reducing the impact of all other kinds of disinformation already discussed above.

Disclaimer: All views are my own.

Tuesday, 14 January 2014

PHEME: A new project on computing the veracity of social media content

The London Eye was on fire during the 2011 England riots! Or was it? Social networks are rife with lies and deception, half-truths and facts. But irrespective of a meme's truthfulness, the rapid spread of such information through social networks and other online media can have immediate and far-reaching consequences. In such cases, large amounts of user-generated content need to be analysed quickly, yet it is not currently possible to carry out such complex analyses in real time.

In the past week I've been very excited (and rather busy) by the starting of a new European project, called PHEME ( PhemeEU).  The aim is to develop automatic methods to help people (e.g. journalists, health professionals, patients, government services) assess the truthfulness of information that is spreading through social networks and other online media.

With partners from seven different countries, the project will combine big data analytics with advanced linguistic and visual methods. The results will be suitable for direct application in medical information systems and digital journalism.

Veracity: The Fourth Challenge of Big Data

Social media poses three major computational challenges, dubbed by Gartner the 3Vs of big data: volume, velocity, and variety.

PHEME will focus on a fourth crucial, but hitherto largely unstudied, challenge: veracity

While writing the proposal, I coined the term phemes to describe memes which are enhanced with truthfulness information. It is a reference also to Pheme - the Greek goddess of fame and rumours.

Identifying Phemes (Rumorous Memes) 

We are concentrating on identifying four types of phemes and modelling their spread across social networks and online media: speculation, controversy, misinformation, and disinformation. However, it is particularly difficult to assess whether a piece of information falls into one of these categories in the context of social media. The quality of the information here is highly dependent on its social context and, up to now, it has proven very challenging to identify and interpret this context automatically.

An Interdisciplinary Approach

PHEME has partners from the fields of natural language processing and text mining, web science, social network analysis, and information visualization. Together, we will use three factors to analyse veracity: first, the information inherent in a document itself – that is lexical, semantic and syntactic information. This is then cross-referenced with data sources that are assessed as particularly trustworthy, for example in the case of medical information, PubMed, the biggest online database in the world for original medical publications. Finally, the diffusion of a piece of information is analysed – who receives what information and from whom, and whether and to whom they pass it on? 

 "Rumor intelligence", that is the ability to identify rumours in good time will be tested, inter alia, in the area of medical information systems. For digital journalism, results will be tested with  swissinfo.ch (the international service of the Swiss Broadcasting Corporation (SBC)) and Ushahidi's SwiftRiver media filtering and verification platform. The new technology will help journalists assesss  the veracity of user-generated content – an activity that is largely carried out manually to date, requiring significant resources. Other news organisations who have expressed support the project are the BBC, the Guardian, and the German regional broadcasting corporation Südwestrundfunk. 

So this is all going to be great - identifying rumours across social media and helping filter out the misinformation. Keep up with our progress - follow PHEME on Twitter!