On GATE, Text and Social Media Analysis, and Detecting Misinformation Online

Monday, 5 January 2026

My ATRIUM Research Visit at the University of Sheffield: Research, Collaboration, and Experiences

by Daniel Kansaon

I am a PhD student in Computer Science in Brazil, with research at the intersection of data science and Natural Language Processing. My work focuses on the large-scale analysis of online communication on messaging platforms, particularly WhatsApp and Telegram. In my doctoral research, I explore the “underground” dynamics of messaging-app groups, investigating coordinated campaigns, harmful attacks, and political polarisation. Messaging apps have increasingly become spaces for political mobilisation, creating conditions in which abusive behaviours, coordinated practices, and narrative amplification can develop and persist. More recently, my research has concentrated on how fear-based and othering narratives emerge and spread within these platforms, especially in political contexts, and how such narratives relate to real-world events.

Through the Transnational Access scheme of the ATRIUM programme, I had the opportunity to spend three months (September–December) on a research visit at the University of Sheffield. This period allowed me to work in a highly stimulating academic environment, engage with researchers whose expertise closely aligns with my work, and advance key components of my doctoral research.

Research activities

During my research visit, I engaged in several activities that were central to advancing my doctoral work. Daily, I went to the university, where I worked closely alongside colleagues and had frequent opportunities for discussion. These daily routines were often accompanied by the classic English tea with milk, a memorable part of everyday life at the university.

I initially dedicated time to deepening my understanding of the concept of othering and how it manifests in political contexts. Understanding this concept is essential for studying fear-based narratives.

I then took full advantage of the research environment and available infrastructure to work on the creation of a unique dataset focused on othering language, a narrative strategy commonly used to divide groups and attack out-groups in political discourse. This dataset represents a key resource for analysing fear-related communication and for training and evaluating computational models capable of identifying such narratives at scale. Developing this resource during the visit allowed me to refine annotation strategies, improve data quality, and ensure methodological consistency.

Throughout this process, I applied analytical strategies and methodological approaches learned during interaction with members of the research group. Their feedback and expertise directly influenced both the design of the dataset and the analytical decisions made.

As a direct result of this research period, I developed the first dataset dedicated specifically to othering language in messaging platforms, along with several promising findings. These results are currently being prepared for submission to a top-tier (A1) academic conference.

Integration into university life: beyond the lab

One of the highlights of my visit was how easy it was to feel part of the university community. Collaboration in Sheffield clearly goes beyond offices and meetings, and daily interactions quickly turned into meaningful connections.

As a Brazilian, I could not stay in Sheffield without playing football. I joined the university football team and took part in a championship with teams from other universities, with matches every Tuesday. We had a great run, winning three out of four games, and I was especially happy to contribute by scoring goals in every match, alongside the best midfield partner, Owen Cook.

These moments on the pitch were not just fun. They helped build connections and strengthen collaboration naturally, while also making me feel at home.

Our ComFootball team is in third place, and now I'm cheering for them here from Brazil.

Experiencing Sheffield: the best place for sports

I am also a big sports fan, and I could not stay in Sheffield without experiencing live football in the city. My first visit was to Sheffield United’s stadium for the match between Sheffield United and Southampton. United scored the opening goal, but unfortunately, conceded two goals in the second half and ended up losing the match. Despite the result, it was an amazing experience watching the world’s first professional football team play in its home city.

I also had the chance to visit the Utilita Arena Sheffield twice to watch the Sheffield Steelers, and I can proudly say I had a 100% success rate. The team won both matches I attended. The atmosphere in the arena was amazing. I really enjoy ice hockey, and this was my first time watching a live game in person, which made the experience even more special.

Experiencing Sheffield: the Christmas Market

My final months in Sheffield were especially memorable, as I had the chance to experience the city during the Christmas season. The festive atmosphere was everywhere, from carols and theatre performances to the Christmas Market in the city centre. The food, the people, the warm apple crumble, and the mulled wine all came together to create a truly wonderful experience. It was a perfect way to end my time in Sheffield and made the city feel even more welcoming and special.

Final Remarks and Acknowledgements

I am very grateful to the ATRIUM programme for making this research visit possible, and to everyone at the University of Sheffield who welcomed me and supported my work during this period. Overall, this experience had a very positive impact and was essential for the advancement of my research. The collaboration, academic environment, and daily interactions played a crucial role in strengthening the theoretical, methodological, and practical aspects of my PhD work.

I would especially like to thank Diana Maynard for accepting me and for her guidance, as well as for the valuable feedback and advice she provided throughout the project. I am also very grateful to Jo Wright for managing all the logistics and for making the stay enjoyable.

I would like to thank my colleagues for the many interesting and valuable conversations, particularly Owen Cook and Fatima Haouari. I am also thankful to my nearby office colleagues, Ibrahim and Eliseu, whose daily presence and informal exchanges made the experience even more enjoyable. I would also like to thank João Leite for his helpful advice and support when I first arrived in the city.

Tuesday, 2 December 2025

Exploring the Digital Identity of Agrifood Products:

My ATRIUM TNA Placement at the GATE Group

By Tenia Panagiotou

As a linguist and postdoctoral researcher at the Consumer and Sensory Lab of the Department of Food Science and Nutrition, University of the Aegean, my work sits at the intersection of language, food, and digital communication. Over the past year, my research has focused on the digital identity of agrifood products – how olive oil, wine, honey, cheese, herbs, and other regional products are portrayed online, how consumers talk about them, and how professionals frame them through branding, marketing narratives, and cultural references. Understanding this “digital identity” requires robust, scalable data extraction and analysis pipelines – something that text mining and AI tools can powerfully enhance.

Through the Transnational Access scheme of the ATRIUM, I had the opportunity to return to the School of Computer Science at the University of Sheffield for a two-week placement (17–28 November 2025). My goal this time was more targeted than during my previous visit. Whereas my earlier placement focused on exploring tools, this one focused on building and evaluating an operational pipeline for analysing agrifood discourse across social media and web sources.

A building with a lawn and a bench

AI-generated content may be incorrect. A sign on a wall

AI-generated content may be incorrect.

School of Computer Science, University of Sheffield

Developing and Validating a Multi-Layer Analytical Pipeline

During this visit, I worked closely with the GATE/CLARIN-UK team (special thanks go to Xingyi Song and Ian Roberts) to refine the multi-step data acquisition and analysis pipeline I presented at the beginning of my stay. This placement concentrated on building and evaluating an operational pipeline for analysing agrifood discourse across social media and web sources. This work forms a foundational part of our broader effort to understand how food products acquire and project their digital identity in consumer-driven environments. A group of people standing in a room

AI-generated content may be incorrect.

Members of the GATE team

As the first analytical layer, I assessed food relatedness, distinguishing posts that genuinely concern food products from those that only mention them peripherally. This was followed by sentiment analysis (positive, negative, neutral) and a more granular classification of emotions using the food-elicited emotion lexicon developed by our team. This dual approach allowed us to capture both surface-level sentiment and more nuanced affective responses associated with agrifood products.

This visit has once again demonstrated the value of interdisciplinary cooperation in addressing complex, data-intensive challenges in food communication and consumer behaviour. I look forward to extending our partnership with the GATE group as we continue building tools and methodologies for understanding online food discourse – an emerging area at the intersection of linguistics, AI, and food science.

A screenshot of a computer

AI-generated content may be incorrect.

Screenshot of pipeline output regarding emotion

A significant part of the workflow involved classifying posts against the 17 United Nations Sustainable Development Goals, allowing us to investigate how sustainability narratives intersect with agricultural food product promotion and consumer expression. In parallel, I applied structured evaluations of health claims (healthy / unhealthy / neither), and identified mentions of nutritional content using recognised nutritional-claim terminology from the literature. Posts were also categorised by diet style, based on the most common diet-related expressions appearing in Greek online food discourse.

The pipeline further incorporated several agrifood-specific layers. These included the identification of sponsored versus non-sponsored posts; extraction of sensory attributes using the ISO sensory-analysis vocabulary; and topic classification using a controlled vocabulary derived from the LanguaL™ food-description thesaurus, an established international standard for structured food categorisation. Additional layers captured time expressions, Protected Designation of Origin/Protected Geographical Indication references, the 13 official Greek prefectures for locality insights, and standard olive oil types, given the prominence of olive-oil content in our dataset.

Together, these components form a comprehensive analytical framework that supports direct comparison between human classification, and AI-driven open-response versus closed-set classification. This comparison will allow us to quantify alignment, divergence, and the specific areas where Large Language Model (LLM)-based classification performs strongly or reveals fragility when applied to Greek agrifood discourse. This will allow us to choose the best route considering the pros and cons of each method. The placement offered the environment and technical guidance required to validate the pipeline, evaluate model behaviour, and refine the overall architecture for future tool development and collaborative research.

A screenshot of a computer

AI-generated content may be incorrect.

Pic.5: Screenshot of pipeline output regarding sensory attributes

Technical Discussions and Next Steps

Beyond the core analysis, the visit provided valuable opportunities for in-depth discussions with GATE researchers about potential tool development and methodological extensions. We reviewed infrastructure constraints, data-format interoperability issues, and strategies for orchestrating multi-step processing within the GATE ecosystem. These conversations will directly shape the next phase of collaboration, guiding both immediate refinements and the longer-term agenda for joint work.

Throughout my stay, the GATE team was extremely supportive, offering expert advice on evaluation frameworks, debugging strategies, and LLM behaviour in multi-label tasks. The environment was collaborative, constructive, and intellectually stimulating. These two weeks enabled me not only to strengthen the methodological aspects of the project but also to strategically plan our next steps, including joint publications, shared datasets, and the creation of specialised agrifood-oriented NLP tools.

I am especially grateful to the ATRIUM programme for providing this opportunity; to Diana Maynard for accepting me for a second visit and for her guidance and support; to Jo Wright for managing all logistics and securing the coziest accommodation for my stay.

A street with lights and people walking

AI-generated content may be incorrect.

A fountain in a park

AI-generated content may be incorrect.

Christmassy Sheffield

Monday, 3 February 2025

Exploring NLP Applications in Food Research: my ATRIUM TNA visit to the GATE group

By Tenia Panagiotou

As a postdoctoral researcher at the Consumer and Sensory Lab of the Department of Food Science and Nutrition of the University of the Aegean, Greece, I study food consumption-related phenomena and consumer expression on social media. In this context, I find that Natural Language Processing (NLP) tools can significantly enhance data collection and analysis. With a background in linguistics, I explore how language can reveal insights into consumer behaviour, culture-specific and cross-cultural food-related trends, attitudes toward products and brands, and consumer expectations. To deepen my understanding of NLP tools and computational applications in social media research, I sought further training in this field.

Pic.1: Collecting posts on social media to investigate food related phenomena: sustainable meat alternatives (left), local versus imported cheeses (right).

The ATRIUM project through its TNA scheme has provided me with the invaluable opportunity to visit the School of Computer Science at the University of Sheffield and explore applications of the GATE cloud tools in my research. Although my visit was relatively short (January 20–31, 2025), it was exceptionally enlightening. I had the privilege of working closely with members of the GATE group, experts in NLP, who welcomed me into their working meetings, discussed my research challenges, and provided insightful solutions and guidance.

Pic. 2: My office away from home at the GATE headquarters (School of Computer Science, University of Sheffield).

During my visit, I explored various GATE tools relevant to my research. Some of those tools were still under development, and the researchers were kind enough to grant me access, assist me in their application, and discuss possible extensions. One particularly useful tool was the Topic Extractor for social media hashtags, which can be used to analyze food consumption-related concepts and generate hierarchical concept graphs. The TwitIE Named Entity Recognizer proved particularly valuable in accurately identifying individual words within multiword hashtags—one of the key challenges I had been aiming to resolve.

Pic. 3: Screenshot of the TwitIE Named Entity Recognizer that can identify words in multiword hashtags (last row).

Additionally, the Named Entity Recognizer offered significant insights by extracting predefined entities such as geopolitical locations, organizations, nationalities, and time references, enriching the analysis of consumer social media posts. I also explored sentiment analysis and opinion mining tools, while investigating how the user classification tool could be adapted for use across different platforms. Another intriguing discovery was the Multilingual Persuasion Technique Classifier, which presents exciting possibilities for analyzing professional posts on food products on social media.

Pic. 4: Identifying languages in posts, translating into English, and running Named Entity Recognition to be used for semantic network analysis.

Beyond these tools, I also received valuable guidance on optimizing ChatGPT and Large Language Models for consistency in responses and on clustering social media post hashtags into semantically meaningful groups for further analysis. Both of these challenges were high on my research agenda, and I am grateful for the specialized insights I received. Apart from the technical expertise I gained, engaging in discussions on shared research interests, such as ontology and semantic network development, was one of the most rewarding aspects of my visit. The openness of the GATE team to exploring extensions of their existing tools and fostering future collaborations made this experience particularly enriching.

Pic. 5: St George's Church, a former parish church (now part of the University of Sheffield as a lecture theatre and student housing) has been my view from the office.

I am deeply grateful to the members of the GATE group for their time, generosity, and willingness to support me both technically and personally. A special thanks goes to Dr. Maynard, who oversaw my visit, for guiding my "investigations" and for our insightful conversations. I would also like to thank Mrs. Wright, research project assistant to the GATE group, for handling the logistical details of my trip and stay in Sheffield, as well as for helping out with the required documentation.

This visit was an invaluable experience that has significantly shaped my research perspective. I look forward to applying what I have learned and fostering further collaborations with the GATE team. I strongly believe that this will not be my last visit to the GATE infrastructure.

Monday, 6 January 2025

GATE team hosts its first ATRIUM TNA research visit: Using NLP to understand trends in political and social debate

In December 2024, we hosted research visitor Tasos Galanopoulos as part of the ATRIUM project (Advancing fronTier Research In the arts and hUManities) TransNational Access scheme. ATRIUM's aim is to bridge 4 leading research infrastructures in: arts and humanities (DARIAH), archaeology (ARIADNE), language technology (CLARIN), and open scholarly communication in the social sciences and humanities (OPERAS). The Transnational Access (TNA) scheme offers fully funded placements for researchers across Europe. This initiative is designed to support Arts and Humanities researchers by providing access to expert knowledge, mentorship, and tools from leading Data Management organisations. Successful applicants have the opportunity to visit one of 14 different host organisations across Europe in order to conduct their research, benefiting from direct contact, knowledge sharing and network building.

Tasos describes his visit below...

How can NLP tools and large language models be used to understand trends in political and social debate around major issues of the day?

What is the relationship between 'distant reading' and the layered understanding that these tools offer for large volumes of data, and 'close reading', understanding aspects of these topical issues?

What role can these modern tools play in the humanities and in everyday journalistic practice?

Questions such as these, on the occasion of a project on "Analysis of textual data from newspapers on the agreement of Greece's accession to the European Economic Community EEC (1961)", in the context of my postgraduate studies in Digital Humanities at the Open University of Greece, brought me to the School of Computer Science at the University of Sheffield at the end of November (23/11/2024 - 7/12/2024), to collaborate with members of the GATE team.

Despite the short period of the stay, the impressions were the best: the patience and goodwill of all the team - with Dr Maynard at the forefront - helped me to "navigate" the tools offered by the GATE Cloud and the European Language Grid, to understand a bit better the processes required, and the wider field, to learn a bit more about its "alphabet" and requirements. At the same time, through the regular meetings of the team I was able to get a "glimpse" of the modern, specialised, and valuable research being carried out at the university.

In relation to the actual subject of the research, the findings from the processing with tools such as NamedEntity Recognition, N-gram detection and their visualization with wordclouds, Topic Classification, Sentiment Analysis, Multidimensional analysis with LIWC-22, Persuasion techniques were very interesting, giving answers and insights to our questions that had to do with the attempt to develop a methodology to identify, document and frame named entities in the context of the investigation of public discourse, Press with different political orientation and political rhetoric in relation to critical events in political life, with reference to the economic and social environment inside and outside the country. Also "identifying" and categorising arguments for and against, and 'bias' for/against in the Press of that time and at a subsequent level , enabled us to explore ways to link entities to key concepts in argumentation.

Overall, my impressions were therefore the best from this constructive visit, a visit that on a personal level gave me inspiration and opened new horizons, but also created new contacts with remarkable people.

Monday, 9 December 2024

Monitoring human rights violations against journalists

In February 2024, the Centre for Freedom of the Media (CFOM) hosted a research secondment focusing on discussiong how to develop monitoring of human rights violations directed at journalists. The secondment saw researchers from CFOM (Dr Diana Maynard from the GATE team, and Prof. Jackie Harrison and Dr. Gemma Horton from the School of Journalism, Media and Communication) come together with UNESCO and Free Press Unlimited to discuss the threats that journalists face and how these can be monitored in line with UN SDG16.10.1.

Following the research secondment, Dr Maynard has been awarded £35,000 to work on a collaborative project with UNESCO and Free Press Unlimited, entitled: “Influencing policy work on human rights violations against journalists”

The project focuses on monitoring and analysing non-lethal attacks on journalists in alignment with Sustainable Development Goal 16.10.1, and aims to better understand 1) the scope and context of these violations, 2) how these can be systematically and reliably monitored, resulting in the creation of 3) a database; all of which will inform future policy on monitoring safety of journalists.

The Sheffield and FPU teams are developing a risk barometer that employs machine learning to identify patterns and indicators, helping to understand the contextual factors that predict lethal and non-lethal violence against journalists. It emphasises the crucial role of Press Freedom and Advocacy organisations in reducing these risks, highlighting the need for a proactive, evidence-based approach to forecasting threats. This strategy aims to enable earlier interventions and stronger protections for journalists globally by identifying contextual risk factors that elevate threats, such as online harassment, legal intimidation, democratic backsliding, civil conflict, and physical violence. In the first stage, it aims to accurately detect patterns and hotspots using global event data from various resources, such as GDELT, which tracks real-time global events every 15 minutes, and will utilize additional resources to gain insights into emerging risks as they arise. Simultaneously, we are connecting contextual data to data on attacks towards journalists, to gain a deeper understanding of the causes of not only lethal but also non-lethal violence towards journalists.

Preliminary findings have identified some basic patterns: online harassment often serves as a precursor to physical intimidation. Additionally, regions such as Gaza, Ukraine, and Mexico have been identified as high-risk areas due to factors including conflict, organised crime, and authoritarian policies. Furthermore, we find that political polarisation seems strongly correlated with the level of legal threats that journalists face. These important relationships will be further investigated by the collaborative team in the upcoming months.