Tuesday 30 July 2019

GATE Cloud services for Google Sheets featured in the CLARIN Newsflash

CLARIN ERIC is a research infrastructure through Europe and beyond to encourage the sharing and sustainability of language data and tools for research in the humanities and social sciences.  We are pleased to announce that our functions for text analysis in Google Sheets were featured in the July 2019 issue of the CLARIN Newsflash.

We are still working on getting Google to publish our add-on, which we hope to have available in the marketplace in a few months. Until then, you can follow the instructions in our previous blog post to use this tool, which currently provides standard and Twitter-oriented named entity recognition for English, French, and German; named entity linking for English, French, and German; and rumour veracity evaluation for English. In the future we will expand the range of functions to cover a wider variety of GATE Cloud services.

Monday 15 July 2019

GATE Cloud services for Google Sheets

Spreadsheets are an increasingly popular way of storing all kinds of information, including text, and giving it some informal structure, and systems like Google Sheets are especially popular for collaborative work and sharing data.

In response to the demand for standard natural language processing (NLP) tasks in spreadsheets, we have developed a Google Sheets add-on that provides functions to carry out the following tasks on text cells using GATE Cloud services:
  • named entity recognition (NER) for standard text (e.g. news) in English, French, or German;
  • NER tuned for tweets in English, French, or German;
  • named entity linking using our YODIE service in English, French, or German;
  • veracity reporting for rumours in tweets.

We have demonstrated this work several times, most recently at the IAMCR conference "Communication, Technology and Human Dignity: Disputed Rights, Contested Truths", which took place on 7–11 July at the Universidad Complutense de Madrid in Spain. There we used it to show how organisations monitoring the safety of journalists could automatically add information about entities and events to their spreadsheets. Potential users have said it looks very useful and they would like access to it as soon as possible.

Google sheet showing Named Entity and Linking applications run over descriptions of journalist killings from the Committee to Protect Journalists (CPJ) databases

We are applying to have this add-on published in the G Suite Marketplace, but the process is very slow, so we are making the software available now as a read-only Google Drive document that anyone can copy and re-use. 

The document contains several examples and instructions are available from the Add-onsGATE Text Analysis menu item. The language processing is actually done on our servers; the spreadsheet functions send the text to GATE Cloud using the REST API and reformat the output into a human-readable form, so they require a network connection and are subject to rate-limiting. You can use the functions without setting up a GATE Cloud account, but if you create one and authenticate while using this add-on, rate-limiting will be reduced.



Open this Google spreadsheet, then use FileMake a copy to save a copy to your own Google Drive (you can’t edit the original). For the functions to work, you will have to grant permission for the scripts to send data to and from GATE Cloud services and to use your user-level cache.

This work has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreements No 687847 (COMRADES) and No 654024 (SoBigData).


Friday 12 July 2019

Using GATE to drive robots at Headstart 2019


In collaboration with Headstart (a charitable trust that provides hands-on science, engineering and maths taster courses), the Department of Computer Science has just run its fourth annual summer school for maths and science A-level students. This residential course ran from 8 to 12 July 2019 and included practical work in computer programming, Lego robots, and project development as well as tours of the campus and talks about the industry.

For the third year in a row, we have included a section on natural language processing using GATE Developer and a special GATE plugin (which uses the ShefRobot library available from GitHub) that allows JAPE rules to operate the Lego robots.  As before, we provided the students with a starter GATE application (essentially the same as in last year's course) containing just enough gazetteer entries, JAPE, and sample code to let them tweet variations like "turn left" and "take a left" to make the robot do just that.  We also use the GATE Cloud Twitter Collector, which we have modified to run locally so the students can set it up on a lab computer so it follows their own twitter accounts and processes their tweets through the GATE application, sending commands to the robots when the JAPE rules match.


Based on lessons learned from the previous years, we put more effort into improving the instructions and the Twitter Collector software to help them get it running faster.  This time the first robot started moving under GATE's control less than 40 minutes from the start of the presentation, and the students rapidly progressed with the development of additional rules and then tweeting commands to their robots.



The structure and broader coverage of this year's course meant that the students had more resources available and a more open project assignment, so not all of them chose to use GATE in their projects, but it was much easier  and more streamlined for them to use than in previous years.







This year 42 students (14 female; 28 male) from around the UK attended the Computer Science Headstart Summer School.
Geography of male students

Geography of female students

The handout and slides are publicly available from the GATE website, which also hosts GATE Developer and other software products in the GATE family.  Source code is available from our GitHub site.  

GATE Cloud development is supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 654024 (the SoBigData project).


Wednesday 3 July 2019

12th GATE Summer School (17-21 June 2019)


12th GATE Training Course: open-source natural language processing with an emphasis on social media

For over a decade, the GATE team has provided an annual course in using our technology. The course content and track options have changed a bit over the years, but it always includes material to help novices get started with GATE as well as introductory and more advanced use of the JAPE language for matching patterns of document annotations.

The latest course also included machine learning, crowdsourcing, sentiment analysis, and an optional programming module (aimed mainly at Java programmers to help them embed GATE libraries, applications, and resources in web services and other "behind the scenes" processing).  We have also added examples and new tools in GATE to cover the increasing demand for getting data out of and back into spreadsheets, and updated our work on social media analysis, another growing field.
Information in "feral databases" (spreadsheets)
We also disseminated work from several current research projects.
Semantics in scientometrics

  • From KNOWMAK and RISIS, we presented our work on using semantic technologies in scientometrics, by applying NLP and ontologies to document categorization in order to contribute to a searchable knowledge base that allows users to find aggregate and specific data about scientific publications, patents, and research projects by geography, category, etc.
  • Much of our recent work on social media analysis, including opinion mining and abuse detection and measurement, has been done as part of the SoBigData project.
  • The increasing range of tools for languages other than English links with our participation in the European Language Grid, which is also supported further development of GATE Cloud, our platform for text analytics as a service.
Conditional processing of multilingual documents

Processing German in GATE
The GATE software distributions, documentation, and training materials from our courses can all be downloaded from our website under open licences. Source code is also available from our github page.

Acknowledgements

The course included research funded by the European Union's Horizon 2020 research and innovation programme under grant agreements No. 726992 (KNOWMAK), No. 654024 (SoBigData), No. 824091 (RISIS), and No. 825627 (European Language Grid); by the Free Press Unlimited pilot project "Developing a database for the improved collection and systematisation of information on incidents of violations against journalists"; by EPSRC grant EP/I004327/1; by the British Academy under the  call "The Humanities and Social Sciences: Tackling the UK’s International Challenges"; and by Nesta.