On GATE, Text and Social Media Analysis, and Detecting Misinformation Online: Deep Learning

Showing posts with label Deep Learning. Show all posts

Wednesday, 20 February 2019

GATE team wins first prize in the Hyperpartisan News Detection Challenge

SemEval 2019 recently launched the Hyperpartisan News Detection Task in order to evaluate how well tools could automatically classify hyperpartisan news texts. The idea behind this is that "given a news text, the system must decide whether it follows a hyperpartisan argumentation, i.e. whether it exhibits blind, prejudiced, or unreasoning allegiance to one party, faction, cause, or person."

Below we see an example of (part of) two news stories about Donald Trump from the challenge data. The one on the left is considered to be hyperpartisan, as it shows a biased kind of viewpoint. The one on the right simply reports a story and is not considered hyperpartisan. The distinction is difficult even for humans, because there are no exact rules about what makes a story hyperpartisan.

In total, 322 teams registered to take part, of which 42 actually submitted an entry, including the GATE team consisting of Ye Jiang, Xingyi Song and Johann Petrak, with guidance from Kalina Bontcheva and Diana Maynard.

The main performance measure for the task is accuracy on a balanced set of articles, though additionally precision, recall, and F1-score were measured for the hyperpartisan class. In the final submission, the GATE team's hyperpartisan classifying algorithm achieved 0.822 accuracy for manually annotated evaluation set, and ranked in first position in the final leader board.

Our winning system was based on using sentence representations from averaged word embeddings generated from the pre-trained ELMo model with a Convolutional Neural Network and Batch Normalization for training on the provided dataset. An averaged ensemble of models was then used to generate the final predictions.

The source code and full system description is available on github.

One of the major challenges of this task is that the model must have the ability to adapt to a large range of article sizes. Most state-of-the-art neural network approaches for document classification use a token sequence as network input, but such an approach in this case would mean either a massive computational cost or loss of information, depending on how the maximum sequence length. We got around this problem by first pre-calculating sentence level embeddings as the average of word embeddings for each sentence, and then representing the document as a sequence of these sentence embeddings. We also found that actually ignoring some of the provided training data (which was automatically generated based on the document publishing source) improved our results, which leads to important conclusions about the trustworthiness of training data and its implications.

Overall, the ability to do well on the hyperpartisan news prediction task is important both for improving knowledge about neural networks for language processing generally, but also because better understanding of the nature of biased news is critical for society and democracy.

Friday, 8 February 2019

Teaching computers to understand the sentiment of tweets

As part of the EU SoBigData project, the GATE team hosts a number of short research visits, between 2 weeks and 2 months, for all kinds of data scientists (PhD students, researchers, academics, professionals) to come and work with us and to use our tools and/or datasets on a project involving text mining and social media analysis. Kristoffer Stensbo-Smidt visited us in the summer of 2018 from the University of Copenhagen, to work on developing machine learning tools for sentiment analysis of tweets, and was supervised by GATE team member Diana Maynard and by former team member Isabelle Augenstein, who is now at the University of Copenhagen. Kristoffer has a background in Machine Learning but had not worked in NLP before, so this visit helped him understand how to apply his skills to this kind of domain.

After his visit, Kristoffer wrote up an excellent summary of his research. He essentially tested a number of different approaches to processing text, and analysed how much of the sentiment they were able to identify. Given a tweet and an associated topic, the aim is to ascertain automatically whether the sentiment expressed about this topic is positive, negative or neutral. Kristoffer experimented different word embedding-based models in order to test how much information different word embeddings carry for the sentiment of a tweet. This involved choosing which embeddings models to test, and how to transform the topic vectors. The main conclusions he drew from the work were that in general, word embeddings contain a lot of useful information about sentiment, with newer embeddings containing significantly more. This is not particularly surprising, but shows the importance of advanced models for this task.

Thursday, 29 November 2018

A Deep Neural Network Sentence Level Classification Method with Context Information

Today we're looking at the work done within the group which was reported in EMNLP2018: "A Deep Neural Network Sentence Level Classification Method with Context Information", authored by Xingyi Song, Johann Petrak and Angus Roberts, all of the University of Sheffield.

Xingyi, S., Petrak, J. & Roberts, A. A Deep Neural Network Sentence Level Classification Method with Context Information. in EMNLP2018 – 2018 Conference on Empirical Methods in Natural Language Processing 00, 0-000 (2018).

Understanding complex bodies of text is a difficult task, especially those in which the context of a statement can greatly influence its meaning. While methods exist that examine the context surrounding a phrase, the authors present a new approach that makes use of much larger contexts than these. This allows for greater confidence in the results of such a method, especially when dealing with complicated subject matter. Medical records are one such area in which complex judgements on appropriate treatments are made across several sentences. It is vital therefore to fully understand the context of each individual statement to be able to collate meaning and accurately understand the sentiment of the entire body of text and the conclusion that should be drawn from it

Although grounded in its use in the medical domain, this new technique can be demonstrated to be more widely applicable. An evaluation of the technique in non-medical domains showed a solid improvement of over six percentage points over its nearest competitor technique despite requiring 33% less training time.

This technique examines not only the subject sentence, but also context on either side of it. This embedding is encoded using an adapted FOFE technique that allows for large contexts without crippling amounts of additional computation.

But how does it work? At its core, this novel method analyses not only the target sentence but also an amount of text on either side of it. This context is encoded using an adapted Fixed-size Ordinally Forgetting Encoding (FOFE), turning it from a variable length context into a fixed length embedding. This is processed along with the target, before being concatenated and post-processed to produce an output.

Experimentation on this new technique was then performed, in comparison to peer techniques. These results showed markedly improved performance compared to LSTM-CNN methods, despite taking almost the same amount of time. The performance of this new Context-LSTM-CNN technique even surpassed an L-LSTM-CNN method despite a substantial reduction in required time.

Average test accuracy and training time. Best values are marked as bold, standard deviations in parentheses

In conclusion, a new technique is presented, Context-LSTM-CNN, that combines the strength of LSTM and CNN with the lightweight context encoding algorithm, FOFE. The model shows a consistent improvement over either a non-context based model and a LSTM context encoded model, for the sentence classification task.

Monday, 20 August 2018

Deep Learning in GATE

Few can have failed to notice the rise of deep learning over the last six or seven years, and its role in our emergence from the AI winter. Thanks in part to the increased speed offered by GPUs*, neural net approaches came into their own and out from under the shadow of the support vector machine, offering more scope than that and other previously popular methods, such as Random Forests and CRFs, for continued improvement as training data volumes increase. Natural language processing has traditionally been a multi-step endeavour, perhaps beginning with tokenization and parsing and working up to semantic processing such as question answering. In addition to being labour-intensive, this approach is also limiting, as each abstraction can only access the current step, and thus throws away potentially valuable information from previous steps. Deep learning offers the possibility to overcome these limitations by bringing a much greater number of parameters into play (much greater flexibility). Deep neural nets (DNNs) may learn end-to-end solutions, starting with raw data and producing sophisticated output. Furthermore they can encode much more complex dependencies than those we have seen in less parameterizable approaches--in other words, much more elaborate reasoning. And while we step back from the need to break down involved problems into pieces ourselves, a promising line of work finds that DNN "skills" are also "transferable"--models may for example be pre-trained on generic data, providing a basic language understanding that can then be put to use in other specialized contexts (multi-task learning).

For these reasons, deep learning is widely seen as key to continuing progress on a wide range of artificial intelligence tasks including natural language processing, so of course it is of great interest to us here in the GATE team. Classic GATE tasks such as entity recognition and sentence classification could be advanced by utilizing an approach with greater potential to learn a discriminative model, given sufficient training data. And by supporting the substitution of words with "embeddings" (DNN-derived vectors that capture relationships between words) trained on readily available unlabelled general or domain-specific data, we can bring some of the benefits of deep learning even to cases where training data are meagre by deep learning standards. Deep learning is therefore likely to be of benefit in any task but the most trivial, as long as you have the skills and a reasonable amount of data.

The Learning Framework is our ongoing project bringing current machine learning technologies to GATE, enabling users to leverage GATE's ecosystem of text processing offerings to create features to train learners, and to include these learners in text processing pipelines. The guiding vision for the Learning Framework has always been to offer an accessible interface that enables GATE users to get up and running quickly with machine learning, whilst at the same time supporting the most current and interesting of technologies. When it comes to deep learning, meeting these twin objectives is a little more challenging, but we have stepped up to the plate!

Deep learning framework in the GATE GUI

Previous machine learning algorithms would work their magic with comparatively little in the way of tweaking required. Deep learning is, however, an entirely different beast in this respect. In fact, it's more like an entire zoo! As discussed above, the advantage to DNNs is their massive flexibility, but this seriously stretches GATE's previous assumptions about how machine learning works. An integration needs to support the design of an architecture (a "shape" of neural net) and the tuning of many parameters, including dropout, optimization strategy, learning rate, momentum, and many more. All of these factors are critical in obtaining a good performance. The integration is still under (very) intensive development, but it is already possible to get something running relatively quickly with deep learning in GATE. Here are some current highlights:

Two of the most-used frameworks for Deep Learning can be used: PyTorch and Keras, both Python-based;
Support for both Linux and MacOS (Windows is not yet supported);
A range of template architectures, which may produce acceptable results out of the box (though in many cases it will be necessary for the user to adapt the architecture, the parameters of the architecture, or other aspects of the DNN solution);
The possibility to work with an initial GATE-created model both inside and outside of GATE.

We encourage anyone who is interested to give it a try and to talk to us about it. There will always be more to add (current challenges include drop-out, gradient clipping, L1/L2 weight regularization, attention, modified weight initialization, char-augmented LSTMs and LSTM-CRF architectures, to name a few) but much is achievable already. This is one of relatively few efforts globally to sift the essence out of this highly active research field and transform it into something relatively high level and generalizable across a range of NLP tasks, making state of the art technologies accessible to non-specialists. There's some documentation available here.

At the same time, we've been applying deep learning in our research in several ways. In a forthcoming EMNLP paper, team member Xingyi Song and co-authors use the fixed-size, ordinally-forgetting (FOFE) approach to combine LSTM and CNN neural net architectures in a more computationally efficient way than previously, in order to make better use of context in sentence classification tasks. Together with researchers at KCL and South London and Maudsley NHS Trust, he's also demonstrated the value of this technology in the context of detection of suicidal ideation in medical records.

Furthermore, we have successfully used LSTMs for veracity verification of rumours spread in social media such as in Twitter. Our approach makes use of only the tweet content, which it passes through LSTM units that learn to distinguish between true, false and unverifiable rumours. However, the unique part of our approach is that prior to passing the tweet to the LSTM layer, it first looks within the tweet for some recurring information that is typically used by others to spread rumours, and makes adjustments on the input--words carrying useful information are kept as they are, and others are downgraded in terms of contribution. This is achieved through attention layer implementation. We evaluated our approach on the RumourEval 2017 test data and achieved over 60% accuracy, which is currently the state-of-the-art performance for this task.

*Graphics Processing Units; technology driven by the demands of computer gamers that has been used to speed up deep learning approaches by as much as 250 times compared with CPUs.

Title artwork from https://www.deviantart.com/redwolf518stock