Thursday 29 November 2012

EnviLOD Workshop Announcement




How do you discover research relevant to research in environmental science? Do you think that the process could be better?

In the EnviLOD project we have been on exploring the potential  of Linked Open Data vocabularies, in the context of environmental science to improve information discovery. We have developed a new tool,  based upon input from the flooding community, which demonstrates how semantic technologies can enhance environmental information discovery. 

We are organising an EnviLOD dissemination workshop, which will provide an introduction to the EnviLOD project, present the newly developed LOD-based semantic enrichment tools and the associated semantic search interface. 

The afternoon session will have more technical and less technical break-out groups and is thus relevant to anybody with an interest in the discovery, management or use of environmental information. 

When:
  25 January 2013
  11.00 – 15.00

Where:
  The British Library Conference Centre
  96 Euston Rd, London NW1 2DB

Contact details: johanna.kieniewicz@bl.uk

Please REGISTER HERE.


EnviLOD Workshop Registration Form



Thursday 18 October 2012

#EnviLOD: User Requirements Survey Results Now Published



The British Library and HR Wallingford, as part of our work on #EnviLOD, carried out a user survey, which identified the following potential use cases for vocabularies to be tested in the semantic enrichment and search workpackages:

  1. Returning results for geographically specific queries. Beyond keyword recognition, this use case includes proximity and recognition of geographic entities that are implied, but not stated within the query (for example in a query for flooding in SW England, identifying towns such as Exeter within that region, without it being explicitly articulated in the query).
  2. Answering non-open-ended queries. In this case, the user is asking for a specific piece of information, which might pertain to a budget, specific piece of legislation, flood levels in a particular locality, etc. For example: What is the annual flood defence expenditure in The Netherlands? These are questions that can be definitively answered, and to which semantic search algorithms can likely be easily trained.
  3. Answering open-ended queries. In this case, a user is conducting research with an aim of learning more about a particular topic. In this case, there is no definitive ‘answer’ to the query—the question is answered once the user has established that s/he has sufficient information on the topic. For example: What are some examples of community engagement relating to flood risk management? These questions are likely more difficult for a LOD approach to add value—but nevertheless represent an important type of question asked by survey respondents.

In general, users were found to prefer Google-style keyword searches, or searches in which they could pose a question above other types of searches. The amount of subject specific jargon used in their queries depended on the nature of the question that was asked, as well as the job held by the individual who was asking it. As such, the LOD vocabularies used in this work need to be flexible, enabling generalist queries, while also allowing subject-specific queries, where possible.

For further details, please see this public deliverable.



Wednesday 17 October 2012

GATE and EnviLOD at the JISC Research Tools programme meeting



Today I went to Birmingham for the #JISCrestools programme meeting, organised by Christopher Brown and Torsten Reimer.

My presentation was on our latest #EnviLOD research on semantic annotation with Linked Open Data (DBPedia, Geonames, and GEMET) coupled with a demo of the Mimir-based semantic search over environmental science literature.

It was a really good meeting, especially seeing in more detail related work around:

  • text mining and social science tools for analysing social media (COSMOS and the Twitter analysis workbench); 
  • SKOS-HASSET on turning the HASSET thesaurus into SKOS and publishing as Linked Open Data, as well as using it for automated indexing; 
  • the INSPIRES project on finding links between researchers;
  • the Histore project which created training modules on text mining for historians, including GATE;
  • the eHealth GATEWay to the Clouds project which will soon publish some GATE plugins for anonymisation of electronic patient records
  • the COMTAX project on community-driven curation of taxonomic databases.
There were many other very interesting ones, just no time to write now about them all, but they are listed here.


Monday 3 September 2012

#EnviLOD: Project Risks and Budget



Like any project involving software development, as well as research, #EnviLOD is facing a number of risks, detailed below:


Risk Description
Probability (P)
1 – 5

Severity (S)
1 – 5

Risk Score
(PxS)
Detail of action to be taken
(mitigation / reduction / transfer / acceptance)
Staff recruitment and  retention
1
3
3
EnviLOD draws on experienced post-doctoral staff, who are already in place at the three partner organisations. 
Delays in tool  development and software integration
1
4
4
The chosen LOD-based semantic enrichment and search tools have been integrated already in GATE. The project team has track record in LOD-based semantic annotation, thesauri, and their use for semantic search.
Stakeholder identification and outreach
1
3
3
British Library and HR Wallingford have already strong user engagement. Additional stakeholder identification and engagement activities will be carried out as part of WP2.
Stakeholders with conflicting requirements
2
4
8
The requirements stage (now completed) would have identified early in the project, if there are any conflicts. 
Requirements gathering over-runs and delays technical development
2
2
4
The project plan is structured so that initial requirements are provided by the British Library and HR Wallingford. This will enable technical development to begin in WP3, while further user engagement and requirements refinement are carried out in parallel in WP2.
Mismatch against user needs
1
5
5
Tight correspondence between user requirements and the developed Linked Data solutions will be ensured through our agile methodology, which includes continuous development-user evaluation loops.
Fail to meet milestones
2
3
6
A project plan with clear objectives, detailed tasks and timings has been produced. We monitor progress fortnightly.
Unforeseen technical issues
1
3
3
Both the LOD-based semantic enrichment and semantic search tools will be based on pre-existing open-source prototypes, which will be developed and customised here.
Stakeholders fail to understand LOD and  semantic search
3
3
9
EnviLOD has a user-driven strategy, informed by the stakeholders from the outset. The semantic search user interface(s) will be designed in consultation with users.
Accuracy of the automatic semantic enrichment services is not sufficient
2
4
8
The continuous evaluation with the British Library and other users will help us monitor accuracy continuously and identify problems as soon as they arise. 
Scalability and real-time text analysis cannot be achieved
3
4
12
Sheffield have experience in large-scale semantic annotation and scalability is a key project focus. Moreover, it is one of the main goals to develop methods for selecting appropriate subsets of LOD resources, this enabling efficient analysis while maintaining sufficiently high quality of results.

EnviLOD Budget

EnviLOD's total budget is £69,771, with £55,816 being funded by JISC.  The budget breaks down as follows:


Wednesday 22 August 2012

#EnviLOD: Project Timeline and Work packages



Our project started in June 2012 and is due to finish on December 31st, 2012.  We have just completed the user requirements gathering stage and are writing up the corresponding deliverable. As soon as it is ready, we will share it here for feedback.  We also had our third meeting today, discussing the work carried out in the past two weeks on user engagement and LOD-based semantic enrichment. 

In the mean time, here are some more details on the project workplan:


WORKPACKAGES

Month

1
2
3
4
5
6
7
 








1: Project Management








2: User Engagement & Case Studies








3: Linked Environment Data Enrichment








4: User-Friendly Semantic Search over Linked Data








5: Evaluation








6: Dissemination & Engagement










WP 1: Project Management (Responsible partner: Sheffield)

The cross-institutional nature of the project necessitates close liaison between Sheffield, the British Library (BL) and HR Wallingford; in addition to communication as a result of collaborative working, monthly telecoms and regular face-to-face meetings will be used to advance the project and monitor progress. 
Deliverables: Project plan. Legacy plan, including sustainability and support. Final report.

WP 2: User Engagement and Case Studies (BL, HR Wallingford)

This WP covers engagement with environmental science researchers and other key stakeholders. This takes place throughout the project, but in particular: (i) early in the project, to produce detailed requirements and use cases, based on interviews; (ii) later in the project, when we will test the utility of Linked Data and assessing how the vocabularies support the needs of researchers and practitioners, and whether the Linked Open Data (LOD) approach will produce an added benefit in comparison with keyword searching.
Deliverables: Stakeholder analysis, requirements and use cases; User feedback.

WP 3: Linked Environment Data Enrichment (Sheffield)

This WP will deliver semantic enrichment tools, based on relevant LOD vocabularies. Where required, relevant ontologies not already connected to existing Linked Environment Data will be integrated. Sheffield’s open-source tools for lookup and term disambiguation with respect to Linked Data vocabularies will be tested and adapted to the environmental science domains. As part of this work, we are evaluating the coverage and accuracy of relevant general purpose LOD datasets (namely GeoNames and DBPedia), when applied to data and content from our domain. Tools for LOD-based geo-location disambiguation, date and measurement recognition and normalisation will also be delivered. 
Our solution is based on Ontotext's high performance OWLIM semantic repository, the open-source GATE semantic annotation tools, and their integration with Linked Data endpoints. We import Linked Data into the semantic repository, which provides a SPARQL endpoint and also full text, metadata, and semantic annotation indices, which underpin the semantic search UI.
Deliverables: Open source tools for semantic enrichment with Linked Environment Data.

WP 4:User-Friendly Semantic Search over Linked Data (Sheffield)

GATE Mimir (Multi-paradigm Information ManagementIndexing and Retrieval) is open-source software framework for multi-paradigm indexing and searching of semantically annotated documents. Enriching documents with explicit semantics allows users to search more effectively for ambiguous names such as London (Ontario) and London (UK).The multi-paradigm aspect of Mimir refers to the accessing and linking together of multiple information sources, such as the textual content of the documents, the semantic metadata and knowledge encoded in the Linked Data vocabularies. Accessing knowledge from Linked Data allows Mimir to understand generalisations, making it capable of answering more complex information needs, such as identification of documents that refer to water levels at the Thames barrier as relevant to a keyword search for flooding in south-east Britain. At the same time, the explicit LOD semantics associated with the indexed semantic metadata and content makes sure that references to places called London (other than the one in the UK) are not seen as relevant results to such a query.
This WP will develop a customised semantic search interface, which enables users to carry out such powerful searches and fully benefit from the knowledge contained in Linked Data, without needing to write SPARQL queries.
Deliverable: A web-based interface for semantic search with Linked Environment Data.

WP 5: Evaluation (Sheffield and BL)

Firstly, quantitative evaluation of the accuracy of semantic enrichment and Linked Data vocabulary coverage will be carried out, based on a human annotated gold standard and established metrics such as f-measure. In addition,  a comparative evaluation of the new semantic search web interface will be completed, against the current keyword-search Envia tool, using a set of search queries supplied by the BL. Evaluation will be carried out in the context of the user requirements developed in WP2.

Deliverables: Quantitative evaluation results; A report detailing the lessons learned.

WP 6: Dissemination and Engagement (Sheffield, BL, HR Wallingford)

The project will devote significant effort to dissemination, including practical activities such as demonstrations and tutorials, to show how project outputs might be exploited in other institutions. Details of planned dissemination activities are provided below.

Deliverables: Presentations; research paper; online demonstration; training materials; blog; website; user workshop, engagement with JISC programme manager and related projects.
Timing
Dissemination Activity
Audience
Purpose
Key Message
M1-M7
Participation in JISC programme activities, such as JISC Involve (http://jiscinvolve.org/)
JISC
Raise awareness, Promote results
Benefits and challenges of using LOD
M1-M7
Collaboration with other “Research Tools” projects
JISC development programmes
Inform, engage, and promote
EnviLOD objectives and results
M2-M7
Project website
External stakeholders and research community
Raise awareness, inform, promote results
EnviLOD objectives and results
M4-M7
Peer-reviewed publications at journals, conferences and workshops, including relevant environmental science (e.g. EnviroInfo, Ecological Informatics), as well as technical semantic technology ones (ISWC, ESWC, Journal Web Semantics)
Research community, including environmental science and web science
Inform and promote research results
EnviLOD research methods, open-source tools, and evaluation results
M7
Dissemination workshop hosted at The British Library
Stakeholders
Engage stakeholders with the EnviLOD outputs
Benefits of LOD for environmental scientists
M3-M7
Practical, “hands-on” outreach, through open-source software, user documentation, online demonstrations and tutorials
Research community, end users, JISC, and other stakeholders
Promote project results
Availability of open-source tools for LOD-based semantic enrichment and search
M1-M7
Engagement with interested researchers from other institutions and other disciplines
Stakeholders
Inform and promote results
Lessons learned and results delivered