Thursday, 16 April 2026

 

Can AI Help Journalists Think? Exploring RAG Systems for Newsroom Applications

Tasos Galanopoulos 

Large Language Models (LLMs) are increasingly present in newsrooms — but often in ways that are difficult to control, audit, or verify. Journalists may use general-purpose tools to search archives, summarise documents, or draft content, without clear visibility into what sources inform the output or how reliable it is. This raises real concerns about editorial integrity and the traceability of AI-assisted work.

As part of my Master’s thesis in Digital Humanities  at the Hellenic Open University, which focuses on the development of a digital toolkit for a newsroom environment, integrating NLP techniques and database systems using Python, I had the opportunity to visit (7–17 April 2026) the University of Sheffield for a second time through the ATRIUM Transnational Access programme, this time specifically to explore Retrieval-Augmented Generation (RAG) — an architecture that allows LLMs to work not from general training data alone, but from a defined, controlled corpus of documents. This makes outputs more grounded, traceable, and domain-specific.

The primary objectives of the visit were to understand the core principles and design choices behind RAG architecture, to implement and test a configurable RAG system on journalistically relevant datasets, to evaluate the impact of different parameters on system performance and to lay the groundwork for integrating a RAG-based assistant into the newsroom toolkit. 


The visit was hosted once again by the GATE (General Architecture for Text Engineering) team at the University of Sheffield's School of Computer Science — one of Europe's leading groups in Natural Language Processing and text analysis tools. 

Once again I had the great opportunity to work under the supervision of Dr Diana Maynard, with the valuable support from Olesya Razuvayevskaya and Ian Roberts. Their guidance was invaluable in helping me move from theoretical familiarity with RAG to a practical, experimentally rigorous implementation.



Building the System



A RAG application was developed using Streamlit as the interface, ChromaDB for vector storage and retrieval, and open-access LLMs — primarily Mistral and DeepSeek — selected for their availability under free-tier constraints. The system allows users to upload document collections, configure retrieval and generation parameters interactively, generate evaluation question sets, and run quantitative assessments of system outputs.

The full codebase is openly available at: https://github.com/tazgal/RAG_assistant

A key methodological choice was the deliberate heterogeneity of the test dataset, designed to simulate the variety of documents a journalist might encounter. Four collections were assembled:

Dataset

Files

Avg. Words

Type

Language

PM_MarchApril2026

11

2,888

Political interviews (Prime Minister)

Greek

PMI_pdfs

7

1,843

Economic reports (Purchasing Managers' Index)

English

Apopsi_test

6

635

Newspaper editorials (left-leaning)

Greek

BankOfGreece_Introductions

5

1,583

Central bank report introductions (2021–2025)

Greek

These datasets varied in length, linguistic style, structural complexity, technical vocabulary, and political orientation — all dimensions that matter in journalistic practice.

Four distinct response styles were implemented via prompt engineering, reflecting different journalistic use cases:

  • Strict RAG (Factual): constrained, source-grounded responses only

  • Journalistic Style (Generative): fluid, newsroom-style text production

  • Analysis & Key Points: structured bullet-point summaries

  • Archivist (Citations & Quotes): extractive, documentation-focused responses with explicit sourcing

A total of 20 experimental RAG configurations were tested per dataset, varying chunk size, chunk overlap, retrieval depth (Top-K), temperature, and response style.

Results

Performance was measured using four embedding-based metrics: Faithfulness (grounding in retrieved context), Answer Relevance (alignment with the query), Context Precision (quality of retrieved chunks), and Ground Truth Similarity (closeness to reference answers).


Summary of Results by Dataset


The results are presented below: 


Dataset

Faithfulness

Answer Relevance

Context Precision

GT Similarity

PMI (Economic Indicators)

0.10–0.37

0.58–0.71

0.23–0.28

0.47–0.56

PM (Political Interviews)

0.56–0.72

0.65–0.79

0.59–0.74

0.62–0.73

Apopsi (Editorials)

0.81–0.93

0.63–0.77

0.64–0.80

0.50–0.70

ΤτΕ (Technical Reports)

0.88–0.94

0.64–0.70

0.60–0.77

0.58–0.66

The full results dataset is available here.


Interesting Findings

As of the findings, one I did not expect is that dataset structure matters more than parameter tuning. The most important factor shaping RAG performance was not which settings were chosen, but the nature of the underlying documents.

  • The PMI economic dataset proved the most challenging. Despite reasonable Answer Relevance scores, Faithfulness was consistently very low (0.10–0.37), indicating that the system was generating plausible-sounding responses without genuine grounding in the retrieved content. This points to a fundamental limitation of standard semantic chunking when applied to structured numerical data — tables, indicators, and statistical formats do not chunk well.

  • The PM political interview dataset showed the most balanced performance across all metrics, making it the best benchmark for evaluating configuration trade-offs. Narrative, conversational text appears well-suited to standard RAG pipelines.

  • The Apopsi editorial dataset achieved very high Faithfulness, likely aided by thematic redundancy — repeated arguments and overlapping claims help the model stay grounded even when individual retrieval chunks are imperfect.

  • The ΤτΕ technical report dataset achieved near-ceiling Faithfulness, but Answer Relevance plateaued, suggesting that while responses are well-grounded, the density of the material limits the model's ability to adapt to query nuances.

It has been confirmed that no single configuration performs well across all domains. Chunking parameters dominate in structured contexts (PMI, ΤτΕ), while generation parameters — temperature and response style — play a larger role in narrative texts (PM, Apopsi). This has a direct practical implication: a one-size-fits-all RAG assistant for journalism is not viable and these findings highlight the need for adaptive RAG systems that dynamically adjust retrieval and generation parameters based on dataset characteristics and task requirements, which are essentially human tasks.

This visit produced both a working system and a clearer research agenda. The experiments confirm that RAG architecture is a promising approach for controlled, transparent AI-assisted journalism — but also that it requires domain-aware design choices that are not yet standard practice.

As for the next steps these include:

  • Comparing all datasets systematically with a wider range of LLM models

  • Testing more "extreme" parameter combinations to identify performance limits

  • Integrating the RAG layer with the SQLite database component of the broader toolkit, particularly to introduce a temporal dimension (event-indexed retrieval)

  • Exploring combinations of RAG with fine-tuned models, to assess their combined effect

Ultimately, the goal is to integrate into the newsroom toolkit a pre-configured RAG assistant with sensible defaults for different journalistic tasks — offering journalists not a black box, but a transparent, adjustable tool that shows its reasoning.


A valuable visit


The ATRIUM Transnational Access programme made this visit possible in a direct and practical sense — but its value went well beyond logistics. Access to the GATE team's expertise accelerated the development of the system significantly, particularly in understanding evaluation frameworks for RAG and the subtleties of retrieval design. Conversations with Dr Maynard and colleagues helped sharpen the experimental methodology and pushed the project toward more rigorous quantitative assessment than would have been possible working alone.

For a researcher at the Master's level, working within an internationally recognised NLP group is a formative experience. It clarified both what is technically feasible and what genuinely interesting research questions remain open in this area. The visit was, in the most direct sense, a contribution to my development as a researcher — and one that will shape the final form of the thesis significantly.


Monday, 5 January 2026

 My ATRIUM Research Visit at the University of Sheffield: Research, Collaboration, and Experiences

by Daniel Kansaon

I am a PhD student in Computer Science in Brazil, with research at the intersection of data science and Natural Language Processing. My work focuses on the large-scale analysis of online communication on messaging platforms, particularly WhatsApp and Telegram. In my doctoral research, I explore the “underground” dynamics of messaging-app groups, investigating coordinated campaigns, harmful attacks, and political polarisation. Messaging apps have increasingly become spaces for political mobilisation, creating conditions in which abusive behaviours, coordinated practices, and narrative amplification can develop and persist. More recently, my research has concentrated on how fear-based and othering narratives emerge and spread within these platforms, especially in political contexts, and how such narratives relate to real-world events. 

Through the Transnational Access scheme of the ATRIUM programme, I had the opportunity to spend three months (September–December) on a research visit at the University of Sheffield. This period allowed me to work in a highly stimulating academic environment, engage with researchers whose expertise closely aligns with my work, and advance key components of my doctoral research.

Research activities

During my research visit, I engaged in several activities that were central to advancing my doctoral work. Daily, I went to the university, where I worked closely alongside colleagues and had frequent opportunities for discussion. These daily routines were often accompanied by the classic English tea with milk, a memorable part of everyday life at the university.

I initially dedicated time to deepening my understanding of the concept of othering and how it manifests in political contexts. Understanding this concept is essential for studying fear-based narratives.

I then took full advantage of the research environment and available infrastructure to work on the creation of a unique dataset focused on othering language, a narrative strategy commonly used to divide groups and attack out-groups in political discourse. This dataset represents a key resource for analysing fear-related communication and for training and evaluating computational models capable of identifying such narratives at scale. Developing this resource during the visit allowed me to refine annotation strategies, improve data quality, and ensure methodological consistency.

Throughout this process, I applied analytical strategies and methodological approaches learned during interaction with members of the research group. Their feedback and expertise directly influenced both the design of the dataset and the analytical decisions made.

As a direct result of this research period, I developed the first dataset dedicated specifically to othering language in messaging platforms, along with several promising findings. These results are currently being prepared for submission to a top-tier (A1) academic conference.







Integration into university life: beyond the lab

One of the highlights of my visit was how easy it was to feel part of the university community. Collaboration in Sheffield clearly goes beyond offices and meetings, and daily interactions quickly turned into meaningful connections.

As a Brazilian, I could not stay in Sheffield without playing football. I joined the university football team and took part in a championship with teams from other universities, with matches every Tuesday. We had a great run, winning three out of four games, and I was especially happy to contribute by scoring goals in every match, alongside the best midfield partner, Owen Cook.

These moments on the pitch were not just fun. They helped build connections and strengthen collaboration naturally, while also making me feel at home.

Our ComFootball team is in third place, and now I'm cheering for them here from Brazil.

Experiencing Sheffield: the best place for sports

I am also a big sports fan, and I could not stay in Sheffield without experiencing live football in the city. My first visit was to Sheffield United’s stadium for the match between Sheffield United and Southampton. United scored the opening goal, but unfortunately, conceded two goals in the second half and ended up losing the match. Despite the result, it was an amazing experience watching the world’s first professional football team play in its home city.

I also had the chance to visit the Utilita Arena Sheffield twice to watch the Sheffield Steelers, and I can proudly say I had a 100% success rate. The team won both matches I attended. The atmosphere in the arena was amazing. I really enjoy ice hockey, and this was my first time watching a live game in person, which made the experience even more special.


 


Experiencing Sheffield: the Christmas Market

My final months in Sheffield were especially memorable, as I had the chance to experience the city during the Christmas season. The festive atmosphere was everywhere, from carols and theatre performances to the Christmas Market in the city centre. The food, the people, the warm apple crumble, and the mulled wine all came together to create a truly wonderful experience. It was a perfect way to end my time in Sheffield and made the city feel even more welcoming and special.



Final Remarks and Acknowledgements

I am very grateful to the ATRIUM programme for making this research visit possible, and to everyone at the University of Sheffield who welcomed me and supported my work during this period.  Overall, this experience had a very positive impact and was essential for the advancement of my research. The collaboration, academic environment, and daily interactions played a crucial role in strengthening the theoretical, methodological, and practical aspects of my PhD work.

I would especially like to thank Diana Maynard for accepting me and for her guidance, as well as for the valuable feedback and advice she provided throughout the project. I am also very grateful to Jo Wright for managing all the logistics and for making the stay enjoyable.

I would like to thank my colleagues for the many interesting and valuable conversations, particularly Owen Cook and Fatima Haouari. I am also thankful to my nearby office colleagues, Ibrahim and Eliseu, whose daily presence and informal exchanges made the experience even more enjoyable. I would also like to thank João Leite for his helpful advice and support when I first arrived in the city.