- Create an instance lexicon for content complexity
- Collect a set of texts from different narrative contexts that the audience may be expected to read, e.g. celebrity news, political news, sports news, medical information leaflets, coursebook fragments.
- Identify the relevant entities in those texts, i.e. persons, locations, organizations, percentages, dates, and technical terms
- Assess the complexity of each text by using crowdsourcing, e.g. have a sample of UK young adults assess the difficulty of the texts via ratings or procedures like CLOZE.
- Assign a complexity value to each entity in the lexicon based on the complexity values of the text it appeared in and its relevance to those texts.
- Identify the relevant entities in the text
- Employ the entity complexity lexicon to compute an estimate value for the new text.
|Running the Term-Raider plugin to identify the entities in the texts.|
|Employing ANNIC to search for entities linked to organizations, locations, persons, dates or percentages within the texts.|
|Result extract exported in XML format from the TermRaider plugin|
|Comparison of the scores assigned by the lexicon (1-10) and the complexity score given to us as a base (0-1)|
Slides from the group presentation