On June 1st, 2012 the GATE Team at the University of Sheffield, in collaboration with the British Library and HR Wallingford, started the #EnviLOD project, funded under the JISC Research Tools Programme.
#EnviLOD aims to demonstrate the value of using Linked Open Data (LOD) vocabularies in the field of environmental science, by pursuing the following objectives:
- Address the problem of LOD domain vocabulary enrichment and interlinking. Develop GATE-based tools for efficient LOD vocabulary lookup and LOD-based term disambiguation. Evaluate these, both quantitatively and with end-users and other stakeholders.
- Develop and evaluate intuitive user interface methods that can hide the complexities of the SPARQL semantic search language, while allowing environmental researchers to search successfully, using LOD vocabularies.
- Build a case study, using the new British Library information discovery tool for environmental science, Envia. Test the use of LOD vocabularies towards enhancing information discovery and management.
- Collaborate with domain experts at the environmental consultants HR Wallingford, providing feedback on how the semantic work undertaken here supports their work as environmental science practitioners and innovators.
Follow EnviLOD on Twitter: #envilod
Background and Motivation
Environmental
Science is a broad, interdisciplinary subject area that spans
biology, chemistry, earth sciences, physics, and engineering. Because
of the breadth of the subject scope, information discovery and
sharing in environmental science is often a challenge. Linked Open
Data (LOD) and vocabularies offer an opportunity to improve the
process of information discovery and sharing through unique,
machine-readable, interlinked open vocabularies, thus ultimately
connecting users more efficiently to useful and relevant resources.
Key
vocabularies for environmental science are already becoming available
as Linked Data (e.g. the GEMET thesaurus), as are other key resources
relevant for the domain (e.g. Geonames, DBpedia). One outstanding
challenge is to use them to enrich unstructured content and metadata
with semantics. Doing so manually is prohibitively expensive and
unsustainable, since LOD vocabularies typically have millions of
instances. Therefore there is a strong need for semantic annotation
tools that enrich metadata and content with LOD semantics
automatically. EnviLOD will tackle the problem of LOD vocabulary
enrichment, interlinking, and adoption in the domain of environmental
science, however, results will be relevant also to other fields. The starting point will be the DBpedia-based entity annotation and disambiguation algorithms, developed by Sheffield as part of the TrendMiner project.
The
second major challenge is to develop information access facilities
that use semantics to deliver a semantic search service, which is not
only more powerful, but also as simple to use as its non-semantic
counterparts. At present, the most widely used method for retrieving
information from Linked Data is through SPARQL queries. However,
formulating such queries is beyond the capabilities of most users and
presents a significant barrier to widespread uptake. EnviLOD will
evaluate user interface methods that can hide the complexities of
SPARQL, while allowing users successfully to utilise semantic search.
In the
context of environmental science, for example, a user searching for
flooding in south-east Britain would be able to find a report with a
chapter on water levels at the Thames barrier. In other words, by
exploiting the additional semantic context from relevant Linked Open
Data ontologies, the user will find a report in the search results
that would not have been picked up based on a simple keyword search.
Deliverables
Deliverables
Output / Outcome Type
|
Brief Description
|
Report
|
User needs
analysis, requirements gathering and use case definition.
|
Software
|
Open source tools
for semantic enrichment with Linked Environment Data.
|
Software
|
A web-based
interface for semantic search with Linked Environment Data.
|
Report
|
Quantitative and
user-based evaluation results.
|
Report
|
A final report
detailing the lessons learned.
|
Publication
|
At least one
research paper
|
Dissemination materials
|
Online
demonstration and documentation; website; blog
|
User engagement event
|
User workshop
|
Project documentation
|
JISC project
documentation (Project plan, project reports, etc)
|
Knowledge built
|
Knowledge of LOD,
LOD-based semantic annotation, and semantic search
|
Knowledge built
|
Spreading
awareness of LOD and its relevance to environmental science
|
Knowledge built
|
Knowledge
transfer between computer scientists, information scientists, and
environmental scientists
|
Critical Success Factors
1. Scalability: LOD resources, such as
DBPedia and GeoNames have (tens of) millions of instances, so using them for
semantic annotation and semantic queries is far from trivial. Thus scalability
and robustness to noisy data are key requirements for EnviLOD. Our solution is based on Ontotext's OWLIM semantic repository, which scales to billions of triples. OWLIM is coupled with the
open-source GATE semantic annotation tools and Linked Data endpoints. We import
Linked Data into the OWLIM semantic repository, which provides a SPARQL endpoint.
GATE Mimir is used to index full text, metadata, and semantic annotations,
which underpin the semantic search UI.
2. Sustainability: All project results will be made available as open-source. Software
will be provided with a clearly-defined API to facilitate adoption. The results
will be incorporated within the Envia discovery tool, which will be supported
by the British Library.
3. Usability: Usability of the semantic
search user interface is paramount. UI mockups will be created and tested first
with the British Library and HR Wallingford, followed by a wider consultation
with key stakeholders. The UI will be designed to match as closely as possible
the user’s current search practices, as well as their needs for
semantically-enhanced queries.
4. Interoperability: This will be achieved through the use of widely adopted standards, such as OWL W3C standard, the RDF W3C standard, .
Dates: 1 June 2012 - 31 December 2012
Follow the GATE Team on Twitter: @GateAcUk
Follow the British Library Science team on Twitter: @ScienceBL
No comments:
Post a Comment