Friday, 8 February 2019

Teaching computers to understand the sentiment of tweets

As part of the EU SoBigData project, the GATE team hosts a number of short research visits, between 2 weeks and 2 months, for all kinds of data scientists (PhD students, researchers, academics, professionals) to come and work with us and to use our tools and/or datasets on a project involving text mining and social media analysis. Kristoffer Stensbo-Smidt visited us in the summer of 2018 from the University of Copenhagen, to work on developing machine learning tools for sentiment analysis of tweets, and was supervised by GATE team member Diana Maynard and by former team member Isabelle Augenstein, who is now at the University of Copenhagen. Kristoffer has a background in Machine Learning but had not worked in NLP before, so this visit helped him understand how to apply his skills to this kind of domain.

After his visit, Kristoffer wrote up an excellent summary of his research. He essentially tested a number of different approaches to processing text, and analysed how much of the sentiment they were able to identify. Given a tweet and an associated topic, the aim is to ascertain automatically whether the sentiment expressed about this topic is positive, negative or neutral. Kristoffer experimented different word embedding-based models in order to test how much information different word embeddings carry for the sentiment of a tweet. This involved choosing which embeddings models to test, and how to transform the topic vectors. The main conclusions he drew from the work were that in general, word embeddings contain a lot of useful information about sentiment, with newer embeddings containing significantly more. This is not particularly surprising, but shows the importance of advanced models for this task.

