Wednesday, 3 July 2019

12th GATE Summer School (17-21 June 2019)


12th GATE Training Course: open-source natural language processing with an emphasis on social media

For over a decade, the GATE team has provided an annual course in using our technology. The course content and track options have changed a bit over the years, but it always includes material to help novices get started with GATE as well as introductory and more advanced use of the JAPE language for matching patterns of document annotations.

The latest course also included machine learning, crowdsourcing, sentiment analysis, and an optional programming module (aimed mainly at Java programmers to help them embed GATE libraries, applications, and resources in web services and other "behind the scenes" processing).  We have also added examples and new tools in GATE to cover the increasing demand for getting data out of and back into spreadsheets, and updated our work on social media analysis, another growing field.
Information in "feral databases" (spreadsheets)
We also disseminated work from several current research projects.
Semantics in scientometrics

  • From KNOWMAK and RISIS, we presented our work on using semantic technologies in scientometrics, by applying NLP and ontologies to document categorization in order to contribute to a searchable knowledge base that allows users to find aggregate and specific data about scientific publications, patents, and research projects by geography, category, etc.
  • Much of our recent work on social media analysis, including opinion mining and abuse detection and measurement, has been done as part of the SoBigData project.
  • The increasing range of tools for languages other than English links with our participation in the European Language Grid, which is also supported further development of GATE Cloud, our platform for text analytics as a service.
Conditional processing of multilingual documents

Processing German in GATE
The GATE software distributions, documentation, and training materials from our courses can all be downloaded from our website under open licences. Source code is also available from our github page.

Acknowledgements

The course included research funded by the European Union's Horizon 2020 research and innovation programme under grant agreements No. 726992 (KNOWMAK), No. 654024 (SoBigData), No. 824091 (RISIS), and No. 825627 (European Language Grid); by the Free Press Unlimited pilot project "Developing a database for the improved collection and systematisation of information on incidents of violations against journalists"; by EPSRC grant EP/I004327/1; by the British Academy under the  call "The Humanities and Social Sciences: Tackling the UK’s International Challenges"; and by Nesta.

No comments:

Post a Comment