12th GATE Training Course: open-source natural language processing with an emphasis on social media
For over a decade, the GATE team has provided an annual course in using our technology. The course content and track options have changed a bit over the years, but it always includes material to help novices get started with GATE as well as introductory and more advanced use of the
JAPE language for matching patterns of document annotations.
The latest course also included machine learning, crowdsourcing, sentiment analysis, and an optional programming module (aimed mainly at Java programmers to help them embed GATE libraries, applications, and resources in web services and other "behind the scenes" processing). We have also added examples and new tools in GATE to cover the increasing demand for getting data out of and back into spreadsheets, and updated our work on social media analysis, another growing field.
|
Information in "feral databases" (spreadsheets) |
We also disseminated work from several current research projects.
|
Semantics in scientometrics |
- From KNOWMAK and RISIS, we presented our work on using semantic technologies in scientometrics, by applying NLP and ontologies to document categorization in order to contribute to a searchable knowledge base that allows users to find aggregate and specific data about scientific publications, patents, and research projects by geography, category, etc.
- Much of our recent work on social media analysis, including opinion mining and abuse detection and measurement, has been done as part of the SoBigData project.
- The increasing range of tools for languages other than English links with our participation in the European Language Grid, which is also supported further development of GATE Cloud, our platform for text analytics as a service.
|
Conditional processing of multilingual documents |
|
Processing German in GATE |
The GATE software distributions, documentation, and training materials from our courses can all be downloaded
from our website under open licences. Source code is also available from
our github page.
Acknowledgements
The course included research funded by the European Union's Horizon 2020 research and innovation programme under grant agreements No. 726992 (KNOWMAK), No. 654024 (SoBigData), No. 824091 (RISIS),
and No. 825627 (European Language Grid); by the
Free Press Unlimited pilot project "Developing a database for the improved collection and systematisation of information on incidents of violations against journalists"; by EPSRC grant EP/I004327/1; by the British Academy under the call "The Humanities and Social Sciences: Tackling the UK’s International Challenges"; and by
Nesta.
No comments:
Post a Comment