Showing posts with label abuse language. Show all posts
Showing posts with label abuse language. Show all posts

Monday, 5 January 2026

 My ATRIUM Research Visit at the University of Sheffield: Research, Collaboration, and Experiences

by Daniel Kansaon

I am a PhD student in Computer Science in Brazil, with research at the intersection of data science and Natural Language Processing. My work focuses on the large-scale analysis of online communication on messaging platforms, particularly WhatsApp and Telegram. In my doctoral research, I explore the “underground” dynamics of messaging-app groups, investigating coordinated campaigns, harmful attacks, and political polarisation. Messaging apps have increasingly become spaces for political mobilisation, creating conditions in which abusive behaviours, coordinated practices, and narrative amplification can develop and persist. More recently, my research has concentrated on how fear-based and othering narratives emerge and spread within these platforms, especially in political contexts, and how such narratives relate to real-world events. 

Through the Transnational Access scheme of the ATRIUM programme, I had the opportunity to spend three months (September–December) on a research visit at the University of Sheffield. This period allowed me to work in a highly stimulating academic environment, engage with researchers whose expertise closely aligns with my work, and advance key components of my doctoral research.

Research activities

During my research visit, I engaged in several activities that were central to advancing my doctoral work. Daily, I went to the university, where I worked closely alongside colleagues and had frequent opportunities for discussion. These daily routines were often accompanied by the classic English tea with milk, a memorable part of everyday life at the university.

I initially dedicated time to deepening my understanding of the concept of othering and how it manifests in political contexts. Understanding this concept is essential for studying fear-based narratives.

I then took full advantage of the research environment and available infrastructure to work on the creation of a unique dataset focused on othering language, a narrative strategy commonly used to divide groups and attack out-groups in political discourse. This dataset represents a key resource for analysing fear-related communication and for training and evaluating computational models capable of identifying such narratives at scale. Developing this resource during the visit allowed me to refine annotation strategies, improve data quality, and ensure methodological consistency.

Throughout this process, I applied analytical strategies and methodological approaches learned during interaction with members of the research group. Their feedback and expertise directly influenced both the design of the dataset and the analytical decisions made.

As a direct result of this research period, I developed the first dataset dedicated specifically to othering language in messaging platforms, along with several promising findings. These results are currently being prepared for submission to a top-tier (A1) academic conference.







Integration into university life: beyond the lab

One of the highlights of my visit was how easy it was to feel part of the university community. Collaboration in Sheffield clearly goes beyond offices and meetings, and daily interactions quickly turned into meaningful connections.

As a Brazilian, I could not stay in Sheffield without playing football. I joined the university football team and took part in a championship with teams from other universities, with matches every Tuesday. We had a great run, winning three out of four games, and I was especially happy to contribute by scoring goals in every match, alongside the best midfield partner, Owen Cook.

These moments on the pitch were not just fun. They helped build connections and strengthen collaboration naturally, while also making me feel at home.

Our ComFootball team is in third place, and now I'm cheering for them here from Brazil.

Experiencing Sheffield: the best place for sports

I am also a big sports fan, and I could not stay in Sheffield without experiencing live football in the city. My first visit was to Sheffield United’s stadium for the match between Sheffield United and Southampton. United scored the opening goal, but unfortunately, conceded two goals in the second half and ended up losing the match. Despite the result, it was an amazing experience watching the world’s first professional football team play in its home city.

I also had the chance to visit the Utilita Arena Sheffield twice to watch the Sheffield Steelers, and I can proudly say I had a 100% success rate. The team won both matches I attended. The atmosphere in the arena was amazing. I really enjoy ice hockey, and this was my first time watching a live game in person, which made the experience even more special.


 


Experiencing Sheffield: the Christmas Market

My final months in Sheffield were especially memorable, as I had the chance to experience the city during the Christmas season. The festive atmosphere was everywhere, from carols and theatre performances to the Christmas Market in the city centre. The food, the people, the warm apple crumble, and the mulled wine all came together to create a truly wonderful experience. It was a perfect way to end my time in Sheffield and made the city feel even more welcoming and special.



Final Remarks and Acknowledgements

I am very grateful to the ATRIUM programme for making this research visit possible, and to everyone at the University of Sheffield who welcomed me and supported my work during this period.  Overall, this experience had a very positive impact and was essential for the advancement of my research. The collaboration, academic environment, and daily interactions played a crucial role in strengthening the theoretical, methodological, and practical aspects of my PhD work.

I would especially like to thank Diana Maynard for accepting me and for her guidance, as well as for the valuable feedback and advice she provided throughout the project. I am also very grateful to Jo Wright for managing all the logistics and for making the stay enjoyable.

I would like to thank my colleagues for the many interesting and valuable conversations, particularly Owen Cook and Fatima Haouari. I am also thankful to my nearby office colleagues, Ibrahim and Eliseu, whose daily presence and informal exchanges made the experience even more enjoyable. I would also like to thank João Leite for his helpful advice and support when I first arrived in the city.



Wednesday, 6 November 2019

Which MPs changed party affiliation, 2017-2019

As part of our work tracking Twitter abuse towards MPs and candidates going into the December 12th general election I've been updating our data files regarding party membership. I thought you might be interested to see the result!

Monday, 12 August 2019

In the News: Online Abuse of Politicians, BBC


We've been working together with the BBC to bring public attention to the issue of online abuse against politicians. Rising tensions in Q1 and Q2 of 2019 meant that politicians were seeing more verbal abuse on Twitter than we have previously observed. The findings were presented on the 6 o'clock and 10 o'clock news on Tuesday, August 6th, and you can see in the histogram above that we found the level of incivility rising to almost 4%. You can see the BBC article describing the work here.

The BBC also did a survey. They found 139 MPs out of the 172 who responded to their survey who said either they or their staff had faced abuse in the past year. More than 60% (108) of those who replied said they had been in contact with the police about threats in the last 12 months.

We found that levels of abuse on Twitter fluctuate over time, with spikes driven by events such as the death of IS bride Shamima Begum's baby or key events in the Brexit negotiations. Labour MP David Lammy has received the most abuse of any MP on Twitter so far this year.

As previously, we also found that on average, male MPs attract significantly more general incivility than female ones, though women attract more sexist abuse. Conservative MPs on average, as previously, attracted significantly more abuse than Labour ones, perhaps because they are in power. Sexist abuse is the most prevalent, as compared with homophobia or racism.

Thursday, 6 June 2019

Toxic Online Discussions during the UK European Parliament Election Campaign


The Brexit Party attracted the most engagement on Twitter in the run-up to the UK European Parliament election on May 23rd, their candidates receiving as many tweets as all the other parties combined. Brexit Party leader Nigel Farage was the most interacted-with UK candidate on Twitter, with over twice as many replies as the next most replied-to candidate, Andrew Adonis of the Labour Party.

We studied all tweets sent to or from (or retweets of or by) UK European Election candidates in the month of May, and classified them as abusive or not using the classifier presented here. It must be noted, in particular, that the classifier only identifies reliably whether a reply is abusive or not. It is not sufficiently accurate for us to reliably judge the target politician or party of this abusive reply. What this means is that we can only reliably identify which EP candidates triggered abuse-containing discussion threads on Twitter, but that often this abuse is actually aimed at other politicians or parties.

In addition to attracting the most replies, the Brexit Party candidates also triggered an unusually high level of abuse-containing Twitter discussions. In particular, we found that posts by Farage triggered almost six times as many abuse-containing Twitter threads than the next most replied to candidate, Gavin Esler of Change UK, during May 2019.

There is an important difference, however, in that that many of the abuse-containing replies to posts by Farage and the Brexit Party were actually abusive towards other politicians (most notably the prime minister and the leader of the Labour party) and not Farage himself. In contrast, abusive replies to Gavin Esler were primarily aimed at the politician himself, triggered by his use of the phrase "village idiot" in connection with the Leave Campaign.

Candidates from other parties that triggered unusually high levels of abuse-containing discussions were those from the UK Independence Party, now considered far right, and Change UK, a newly formed but unstable remain party. Change UK was the most active on Twitter, with candidates sending more tweets than other parties. Gavin Esler was the most replied-to Change UK candidate, and also received an unusually high level of abuse. The abuse often referred to his use of the phrase "village idiot" in connection with the leave campaign, which resulted in anger and resentment.

In contrast, MEP candidates from the Conservative and Labour Parties were not hubs of polarised, abuse-containing discussions on Twitter.

What these findings, unsurprisingly, demonstrate is that politicians and parties who themselves use divisive and abusive language, for example, to brand political opponents as “village idiots”, “traitors”, or as “desperate to betray”, are thus triggering the toxic online responses and deep political antagonism that we have witnessed.

After the Brexit Party, the next most replied-to MEP candidates were from the Labour partyAfter the Brexit Party, the next most replied-to party was Labour, according to the study, followed by Change UK.

MEP candidates from both the Liberal Democrats and the Green Party were also active on Twitter, with the Green MEP candidates second only to Change UK ones for number of tweets sent, but didn't get a lot of engagement in return. The Liberal Democrats in particular received a low number of replies. This may suggest that these parties became the choices of default for a population of discouraged remainers, as both made gains in the election. Both parties attracted a particularly civil tone of reply.

Brexit Party candidates were also the ones that replied most to those who tweeted them, rather than authoring original tweets or retweeting other tweets.

Acknowledgements: Research carried out by Genevieve Gorrell, Mehmet Bakir, and Kalina Bontcheva. This work was partially supported by the European Union under grant agreements No. 654024 SoBigData and No. 825297 WeVerify.

Tuesday, 11 September 2018

Vizualisations of Political Hate Speech on Twitter

Recently there's been some media interest in our work on abuse toward politicians. We performed an analysis of abusive replies on Twitter sent to MPs and candidates in the months leading up to the 2015 and 2017 UK elections, disaggregated by gender, political party, year, and geographical area, amongst other things. We've posted about this previously, and there's also a more technical publication here. In this post, we wanted to highlight our interactive visualizations of the data, which were created by Mark Greenwood. The thumbnails below give a flavour of them, but click through to access the interactive versions.

Abusive Replies

Sunburst diagrams showing the raw number of abusive replies sent to MPs before the 2015 and 2017 elections. Rather than showing all candidates, these only show the MPs who were elected (i.e. the successful candidates). These nicely show the proportion of abusive replies sent to each party/gender combination but don't give any feeling per MP the proportion of replies which were abusive. Interactive version here!

Increase in Abuse

An overlapping bar chart showing how the percentage of abuse received per party/gender by MPs has increased between 2015 and 2017. For each party/gender two bars are drawn. The height of the bar in the party colour represents the percentage of replies which were abusive in 2017. The height of the grey bar (drawn at the back) is the percentage of replies which were abusive in 2015 and the width shows the change in volume of abusive replies (i.e. the width is calculated by dividing the 2015 raw abusive reply count by that from 2017 to give a percentage which is then used to scale the width of the bar). So height shows change in proportion, width shows increase in volume. There is also a simple version of this graph which only shows the change in proportion (i.e. the widths of the two bars are the same). Original version here.

Geographical Distribution of Abuse

A map showing the geographical distribution of abusive replies. The map of the UK is divided into the NUTS 1 regions, and each region is coloured based on the percentage of abusive replies sent to MPs who represent that region. Data from both 2015 and 2017 can be displayed to see how the distribution of abuse has changed. Interactive version here!

Sunday, 23 July 2017

The Tools Behind Our Twitter Abuse Analysis with BuzzFeed


Or...How to Quantify Abuse in Tweets in 5 Working Days


When BuzzFeed approached us with the idea to quantify Twitter abuse towards politicians during the election campaign, we only had five working days, before the article had to be completed and go public.   

The goal was to use text analytics and analyse tweets replying to UK politicians, in the run up to the 2017 general election, in order to answer questions such as:

  • How wide spread is abuse received by politicians?
  • Who are the main politicians targeted by such abusive tweets?
  • Are there any party or gender differences?
  • Do abuse levels stay constant over time or not?  
So here I explain first how we collect the data for such studies and then how it gets analysed at scale and fast, all with our GATE-based open-source tools and their GATE Cloud text analytics-as-a-service deployment.

For researchers wishing more in-depth details, please read and cite our paper:

D. Maynard, I. Roberts, M. A. Greenwood, D. Rout and K. Bontcheva. A Framework for Real-time Semantic Social Media Analysis. Web Semantics: Science, Services and Agents on the World Wide Web, 2017 (in press). https://doi.org/10.1016/j.websem.2017.05.002, pre-print

Tweet Collection 


We already had all necessary tweets at hand, since, within an hour of #GE2017 being announced, I set up, using the GATE Cloud tweet collection service:

the continuous collection of tweets by MPs, prominent politicians, parties, and candidates, as well as retweets and replies thereof. 

I also made a second twitter collector service running in parallel, to collect election related tweets based purely on hashtags and keywords (e.g. #GE2017, vote, election).

How We Analysed and Quantified Abuse 


Given the short 5 day deadline, we were pleased to have at hand the large-scale, real-time text analytics tools in GATE, Mimir/Prospector, and GATE Cloud. 

The starting point was the real-time text analysis pipeline from the Brexit research last year. That is capable of analysing up to 100 tweets per second (tps), although, in practice, the tweets usually were coming at the much lower 23 tps.  

This time, however, we adapted it with a new abuse analysis component, as well as some more up-to-date knowledge about the politicians (including the new prime minister). 




The analysis backbone was again GATE's TwitIE system, which consists of a tokenizer, normalizer, part-of-speech tagger, and a named entity recognizer. TwitIE is also available as-a-service on GATE Cloud, for easy integration and use.

Next, we added information about politicians, e.g. their names, gender, party, constituencies, etc. In this way, we could produce aggregate statistics, such as abuse-containing tweets aimed at Labour or Conservative male/female politicians. 

Next is a tweet geolocation component, which uses latitude/longitude, region, and user location metadata to geolocate tweets within the UK NUTS2 regions. This is not always possible, since many accounts and tweets lack such information, and this narrow down the sample significantly, should we choose to restrict by geo-location.

We also detect key themes and topics discussed in the tweets (more than one topic/theme can be contained in each tweet). Here we reused the module from the Brexit analyser.

The most exciting part was working with BuzzFeed's journalists to curate a set of abuse nouns typically aimed at people (e.g. twat), racist words, and milder insults (e.g. coward).  We decided to differentiate these from general obscene language and swearing, as these were not always targeting the politician. Nevertheless, they were included in the system, to produce a separate set of statistics. We introduced also basic sub-classification by kind (e.g. racial) and strength (e.g. mild, medium, strong), derived from an Ofcom research report on offensive language


Overall, we kept the processing pipeline as simple and efficient as possible, so it can run at 100 tweets per second even on a pretty basic server.  

The analysis results were fed into GATE Mimir, which indexes efficiently tweet text and all our linguistic annotations. Mimir has a powerful programming API for semantic search queries, which we use to drive the various interactive visualisations and to generate the necessary aggregate statistics behind them. 

For instance, we used Mimir queries to generate statistics and visualisations, based on time (e.g. most popular hashtags in abuse-containing tweets on 4 Jun); topic (e.g. the most talked about topics in such tweets), or target of the abusive tweet (e.g. the most frequently targeted politicians by party and gender). We could also navigate to the corresponding tweets behind these aggregate statistics, for a more in-depth analysis.

A rich sample of these statistics, associated visualisations, and abusive tweets is available in the BuzzFeed article.

Research carried out by:


Mark A. Greenwood, Ian Roberts, Dominic Rout, and myself, with ideas and other contributions from Diana Maynard and others from the GATE Team. 

Any mistakes are my own.