On GATE, Text and Social Media Analysis, and Detecting Misinformation Online: food

Monday, 3 February 2025

Exploring NLP Applications in Food Research: my ATRIUM TNA visit to the GATE group

By Tenia Panagiotou

As a postdoctoral researcher at the Consumer and Sensory Lab of the Department of Food Science and Nutrition of the University of the Aegean, Greece, I study food consumption-related phenomena and consumer expression on social media. In this context, I find that Natural Language Processing (NLP) tools can significantly enhance data collection and analysis. With a background in linguistics, I explore how language can reveal insights into consumer behaviour, culture-specific and cross-cultural food-related trends, attitudes toward products and brands, and consumer expectations. To deepen my understanding of NLP tools and computational applications in social media research, I sought further training in this field.

Pic.1: Collecting posts on social media to investigate food related phenomena: sustainable meat alternatives (left), local versus imported cheeses (right).

The ATRIUM project through its TNA scheme has provided me with the invaluable opportunity to visit the School of Computer Science at the University of Sheffield and explore applications of the GATE cloud tools in my research. Although my visit was relatively short (January 20–31, 2025), it was exceptionally enlightening. I had the privilege of working closely with members of the GATE group, experts in NLP, who welcomed me into their working meetings, discussed my research challenges, and provided insightful solutions and guidance.

Pic. 2: My office away from home at the GATE headquarters (School of Computer Science, University of Sheffield).

During my visit, I explored various GATE tools relevant to my research. Some of those tools were still under development, and the researchers were kind enough to grant me access, assist me in their application, and discuss possible extensions. One particularly useful tool was the Topic Extractor for social media hashtags, which can be used to analyze food consumption-related concepts and generate hierarchical concept graphs. The TwitIE Named Entity Recognizer proved particularly valuable in accurately identifying individual words within multiword hashtags—one of the key challenges I had been aiming to resolve.

Pic. 3: Screenshot of the TwitIE Named Entity Recognizer that can identify words in multiword hashtags (last row).

Additionally, the Named Entity Recognizer offered significant insights by extracting predefined entities such as geopolitical locations, organizations, nationalities, and time references, enriching the analysis of consumer social media posts. I also explored sentiment analysis and opinion mining tools, while investigating how the user classification tool could be adapted for use across different platforms. Another intriguing discovery was the Multilingual Persuasion Technique Classifier, which presents exciting possibilities for analyzing professional posts on food products on social media.

Pic. 4: Identifying languages in posts, translating into English, and running Named Entity Recognition to be used for semantic network analysis.

Beyond these tools, I also received valuable guidance on optimizing ChatGPT and Large Language Models for consistency in responses and on clustering social media post hashtags into semantically meaningful groups for further analysis. Both of these challenges were high on my research agenda, and I am grateful for the specialized insights I received. Apart from the technical expertise I gained, engaging in discussions on shared research interests, such as ontology and semantic network development, was one of the most rewarding aspects of my visit. The openness of the GATE team to exploring extensions of their existing tools and fostering future collaborations made this experience particularly enriching.

Pic. 5: St George's Church, a former parish church (now part of the University of Sheffield as a lecture theatre and student housing) has been my view from the office.

I am deeply grateful to the members of the GATE group for their time, generosity, and willingness to support me both technically and personally. A special thanks goes to Dr. Maynard, who oversaw my visit, for guiding my "investigations" and for our insightful conversations. I would also like to thank Mrs. Wright, research project assistant to the GATE group, for handling the logistical details of my trip and stay in Sheffield, as well as for helping out with the required documentation.

This visit was an invaluable experience that has significantly shaped my research perspective. I look forward to applying what I have learned and fostering further collaborations with the GATE team. I strongly believe that this will not be my last visit to the GATE infrastructure.

Monday, 28 February 2022

How green is your recipe? Using GATE to calculate the environmental impact of recipes

The calculation of environmental impacts from recipes remains a barrier to effective uptake of sustainable diets. In a recent project funded by Alpro, led by Dr Christian Reynolds from the Centre for Food Policy at City University London, we explored digitised recipe texts from websites in English, Dutch and German. We study recipes rather than individual ingredients because this is how people typically think about environmental impact and diet.

Recipes are hard to process because they use different weights and measures, and sometimes quite vague or obscure terms (e.g. "a pinch of salt", "a handful of lettuce"). Together with our project partner Text Mining Solutions, we used GATE to develop customised tools to automatically extract ingredients, quantities and units from 220,168 indexed recipes, and to match these to a food environmental database of 4500 ingredients (using the classification system FoodEx2). This database provided Land Use, GHG emissions, Eutrophying Emissions, Stress-Weighted Water Use, and Freshwater Withdrawals for each ingredient.

Nutrition information was sourced from the USDA FoodData Central (McKillop et al., 2021) and McCance and Widdowson's Composition of Foods Integrated Database (Public Health England, 2015). Environmental and Nutrition information was matched to two classification systems (FoodEx2, containing 4,500 ingredients, and USDA Nutrient Database, containing 2,484 ingredients). This allowed us to calculate these impacts at the mean, 5% and 95% confidence level per recipe and per portion, enabling us to explore the environmental impacts of vegan, vegetarian and non-vegetarian (omnivore) recipes if we were to cook these recipes using contemporary ingredients.

To validate the tool, we manually calculated the impacts of 50 recipes from 4 websites: BBC Good Food, Albert Heijn/Allerhande, AllRecipes.com and Kochbar, and compared these with the results from our tool.

We created a website where you can enter a recipe and get back the calculation for the recipe and per portion (with confidence intervals). The image below shows a sample screenshot.

We presented some of our findings as a poster at the Livestock, Environment and People (LEAP) conference in December 2021. You can find more examples of our analysis and results there.

It's interesting to see how the recipes from the different countries, as well as recipes with different protein sources, lead to different median CO2 footprints. Below we see a chart showing the median GHGE per portion in recipes from different protein sources (e.g. those containing beef, those containing tofu) in omnivore, vegetarian, and vegan recipes. Unsurprisingly, the dishes containing meat have higher GHGE values on the whole, though we do find variations within individual recipes. We were particularly excited to find a recipe for chocolate cake that "beat" a salad in terms of low GHGE!

Chart

When we compared the different datasets (depicting recipes from different European countries) in terms of median GHGE per protein source, we found that Kochbar (German) recipes typically fared the worst, followed by the BBC Good Food recipes (British), and Albert Heijn (Dutch) faring much better.

The work is now continuing with the development of a dashboard enabling additional visualisations and further analysis to be produced.