Release
The GATE Team is proud to announce two new releases that bring GATE and Python together:
- Python GateNLP (version 1.0.2): a Python 3 package that brings many of the concepts and the ease of handling documents, annotations and features to Python.
- GATE Python Plugin (version 3.0.2): a new plugin that can be used from Java GATE to process documents using Python code and the methods provided by the Python GateNLP package
Both releases are meant as first releases to a wider community to give feedback about what users need and what the basic design should look like.
Feedback
Users are invited to give feedback about the Python GateNLP package:
- If you detect a bug, or have a feature request, please use the GitHub Issue Tracker
- For more general discussions, ideas, asking the community for help, please use (preferably) the GitHub Discussions Forum or the General GATE Mailing List
- We are also interested in feedback about the API and the functionality of the package. If you want to use the package for your own development and want to discuss changes, improvements or how you can contribute, please use the GitHub Discussions Forum
- We are happy to receive contributions! Please create an issue and discuss/plan with developers on the issue tracker before providing a pull request.
To give feedback about the Python Plugin:
- For reporting bugs or feature requests, please use the GitHub Issue Tracker
- For getting help and more general discussions, please use the General GATE Mailing List
IMPORTANT: whenever you give feedback, please include as much detail about your Operating System, Java or Python version, package/plugin version and your concrete problem or question as possible!
GATE Course Module
Python GateNLP
- Documents with arbitrarily many features, arbitrarily many named Annotation sets. GateNLP also adds the capability of keeping a ChangeLog
- AnnotationSets with arbitrarily many (stand-off) Annotations which can overlap in any way and can span any character range (not just entire tokens/words)
- Annotations with arbitrarily many features, grouped per set by some annotation type name
- Features which map keys to arbitrary values
- Corpora: collections of documents. Python GateNLP provides corpora that directly map to files in a directory (recursively).
- Prepared modules for processing documents. In GateNLP these are called "Annotators" and also allow for filtering, splitting of documents
- Reading and writing in various formats. GateNLP uses three new formats, "bdocjs" (JSON serialization), "bdocym" (YAML serialization) and "bdocMP" (Message Pack serialization). Documents in that format can be exchanged with Java GATE through the GATE plugin Format_Bdoc
- Gazetteers for fast lookup and annotation of token sequences or character sequences which match a large list of known terms or phrases
- A way to annotate documents based on patterns based on text and other annotations and annotation features: PAMPAC
- A HTML visualizer which allows the user to interactively view GATE documents, annotations and features as separate HTML files or within Jupyter notebooks.
- Bridges to powerful NLP libraries and conversion of their annotations to GateNLP annotations:
- GateWorker: an API that allows the user to directly run Java GATE from Python and exchange documents between Python and Java
- The Java GATE Python Plugin (see below) allows the user to run Python GateNLP code directly from Java GATE and process documents with it.