Thursday, 18 October 2012

#EnviLOD: User Requirements Survey Results Now Published

The British Library and HR Wallingford, as part of our work on #EnviLOD, carried out a user survey, which identified the following potential use cases for vocabularies to be tested in the semantic enrichment and search workpackages:

  1. Returning results for geographically specific queries. Beyond keyword recognition, this use case includes proximity and recognition of geographic entities that are implied, but not stated within the query (for example in a query for flooding in SW England, identifying towns such as Exeter within that region, without it being explicitly articulated in the query).
  2. Answering non-open-ended queries. In this case, the user is asking for a specific piece of information, which might pertain to a budget, specific piece of legislation, flood levels in a particular locality, etc. For example: What is the annual flood defence expenditure in The Netherlands? These are questions that can be definitively answered, and to which semantic search algorithms can likely be easily trained.
  3. Answering open-ended queries. In this case, a user is conducting research with an aim of learning more about a particular topic. In this case, there is no definitive ‘answer’ to the query—the question is answered once the user has established that s/he has sufficient information on the topic. For example: What are some examples of community engagement relating to flood risk management? These questions are likely more difficult for a LOD approach to add value—but nevertheless represent an important type of question asked by survey respondents.

In general, users were found to prefer Google-style keyword searches, or searches in which they could pose a question above other types of searches. The amount of subject specific jargon used in their queries depended on the nature of the question that was asked, as well as the job held by the individual who was asking it. As such, the LOD vocabularies used in this work need to be flexible, enabling generalist queries, while also allowing subject-specific queries, where possible.

For further details, please see this public deliverable.

Wednesday, 17 October 2012

GATE and EnviLOD at the JISC Research Tools programme meeting

Today I went to Birmingham for the #JISCrestools programme meeting, organised by Christopher Brown and Torsten Reimer.

My presentation was on our latest #EnviLOD research on semantic annotation with Linked Open Data (DBPedia, Geonames, and GEMET) coupled with a demo of the Mimir-based semantic search over environmental science literature.

It was a really good meeting, especially seeing in more detail related work around:

  • text mining and social science tools for analysing social media (COSMOS and the Twitter analysis workbench); 
  • SKOS-HASSET on turning the HASSET thesaurus into SKOS and publishing as Linked Open Data, as well as using it for automated indexing; 
  • the INSPIRES project on finding links between researchers;
  • the Histore project which created training modules on text mining for historians, including GATE;
  • the eHealth GATEWay to the Clouds project which will soon publish some GATE plugins for anonymisation of electronic patient records
  • the COMTAX project on community-driven curation of taxonomic databases.
There were many other very interesting ones, just no time to write now about them all, but they are listed here.