1

We are trying to implement this research paper named: "Semantic search in the World News domain using automatically extracted metadata files" (available at https://www.sciencedirect.com/science/article/pii/S0950705111002735)

In this paper, the authors used an ontology called World News ontology (WNO). But we are not sure about how they populated instances for this ontology. The main purpose of this ontology is to annotate online news articles. For that, they loaded this ontology in the GATE developer (General Architecture of Text Engineering) and used the ANNIE pipeline to do the final annotation.

World News Ontology: IPTC (International Press Telecommunications Council) provides a set of terms called NewsCodes. The authors used a subset of NewsCodes to build this ontology. This subset contains the following subjects:

  • crime, law & justice
  • disaster & accident
  • economy, business & finance
  • environmental issue
  • health
  • labour
  • politics
  • science & technology
  • social issue
  • unrest, conflicts & war
  • weather

Each of these subjects has three levels: Subject, Subjectmatter, and Subjectdetail.

  • Subject: terms of level ‘‘Subject’’ provides a description of the editorial content of news at a high level
  • Subject matter: a ‘‘Subjectmatter’’ provides a description at a more precise level.
  • Subject detail: a ‘‘Subjectdetail’’ provides a description at a rather specific level.

For example, example of taxonomy levels of IPTC NewsCodes

For science and technology, the first level, subject is science & technology, the second level, Subjectmatter is biomedical science, and the third level, Subjectdetail is Subjectdetail. Since they have not provided any explicit reference to the structure of the ontology, we have assumed that all the Subjects would be the World News Ontology classes. The terms of the next level (Subjectmatter) would be subclasses of the 1st level and the terms of the last level (Subjectdetail) would be subclasses of the 2nd level. But we are not hundred percent sure if our assumption is correct.

Now to annotate texts from news articles, we need to have instances in this ontology. The only thing the authors mention about populating the World News Ontology is as follows:

Onto Gazetteer makes use of one important concept in the ANNIE subsystem, which is gazetteer lists. These are very important for the annotation process because they contain the words that are going to be matched in the document so as to produce annotations. In our case, the WNO is integrated with ANNIE so as to find terms and concepts from the ontology. To achieve this, we created a gazetteer list for every term of the WNO. This list contains synonyms of the term or even phrases that may define it. This process resulted in the creation of 465 gazetteer lists with a total of approximately 24000 records. The Onto Gazetteer component is responsible for finding matches of words or phrases contained in these lists in the articles which are being processed. Finally, it generates an annotation for every match.

This seemingly indicates that they populated the ontology using Gazetteer lists. But surely they haven't created all these 465 gazetteer lists manually. Is there any way to automatically create a gazetteer list? And also, how to populate an ontology from a gazetteer list? Is our assumption for the World News Ontology correct?

Masroor
  • 103
  • 3

0 Answers0