Sunday, 7 February 2021

New releases bringing GATE and Python closer together

Release

The GATE Team is proud to announce two new releases that bring GATE and Python together:

  • Python GateNLP (version 1.0.2): a Python 3 package that brings many of the concepts and the ease of handling documents, annotations and features to Python.
  • GATE Python Plugin (version 3.0.2): a new plugin that can be used from Java GATE to process documents using Python code and the methods provided by the Python GateNLP package

Both releases are meant as first releases to a wider community to give feedback about what users need and what the basic design should look like. 

Feedback

Users are invited to give feedback about the Python GateNLP package:

  • If you detect a bug, or have a feature request, please use the GitHub Issue Tracker
  • For more general discussions, ideas, asking the community for help, please use (preferably) the GitHub Discussions Forum or the General GATE Mailing List
  • We are also interested in feedback about the API and the functionality of the package. If you want to use the package for your own development and want to discuss changes, improvements or how you can contribute, please use the GitHub Discussions Forum 
  • We are happy to receive contributions! Please create an issue and discuss/plan with developers on the issue tracker before providing a pull request.

To give feedback about the Python Plugin:

IMPORTANT: whenever you give feedback, please include as much detail about your Operating System, Java or Python version, package/plugin version and your concrete problem or question as possible!

GATE Course Module

Module 11 of the upcoming online GATE course in February 2021 will introduce the Python GateNLP package and the GATE Python plugin. You can register for this and many other modules of the course here.

Python GateNLP

Python GateNLP is a Python NLP framework which provides some of the concepts and abstractions known from Java GATE in Python, plus a number of new features: 
  • Documents with arbitrarily many features, arbitrarily many named Annotation sets. GateNLP also adds the capability of keeping a ChangeLog
  • AnnotationSets with arbitrarily many (stand-off) Annotations which can overlap in any way and can span any character range (not just entire tokens/words)
  • Annotations with arbitrarily many features, grouped per set by some annotation type name
  • Features which map keys to arbitrary values 
  • Corpora: collections of documents. Python GateNLP provides corpora that directly map to files in a directory (recursively). 
  • Prepared modules for processing documents. In GateNLP these are called "Annotators" and also allow for filtering, splitting of documents
  • Reading and writing in various formats. GateNLP uses three new formats, "bdocjs" (JSON serialization), "bdocym" (YAML serialization) and "bdocMP" (Message Pack serialization). Documents in that format can be exchanged with Java GATE through the GATE plugin Format_Bdoc
  • Gazetteers for fast lookup and annotation of token sequences or character sequences which match a large list of known terms or phrases
  • A way to annotate documents based on patterns based on text and other annotations and annotation features: PAMPAC
  • A HTML visualizer which allows the user to interactively view GATE documents, annotations and features as separate HTML files or within Jupyter notebooks.
  • Bridges to powerful NLP libraries and conversion of their annotations to GateNLP annotations:
  • GateWorker: an API that allows the user to directly run Java GATE from Python and exchange documents between Python and Java
  • The Java GATE Python Plugin (see below) allows the user to run Python GateNLP code directly from Java GATE and process documents with it.

GATE Python Plugin

The GATE Python Plugin is one of many GATE plugins that extend the functionality of Java GATE. This plugin allows the user to process GATE documents running in the Java GATE GUI or via the multiprocessing Gate Cloud Processor (GCP) with Python programs (which use the GateNLP API for manipulating documents).

No comments:

Post a Comment