Call for Challenge: Open Knowledge Extraction Challenge

Challenge Website: https://github.com/anuzzolese/oke-challenge  

General Chair

- Fabien Gandon (Inria, Sophia Antipolis, France)

Challenge Coordinators

- Elena Cabrio (Inria, Sophia Antipolis, France)

- Milan Stankovic (SEPAGE, Paris, France)

Challenge Chairs

- Aldo Gangemi, LIPN, University Paris 13 (France)

- Roberto Navigli, University of Rome La Sapienza (Italy)

- Valentina Presutti, CNR STLAB Laboratory (Italy)

- Dario Garigliotti, University of Rome La Sapienza (Italy)

- Anna Lisa Gentile, University of Sheffield (UK)

- Andrea Nuzzolese, CNR STLAB Laboratory (Italy)

Important Dates

  • March 27, 2015, 23:59 CET: Paper Submission due
  • April 9, 2015: acceptance of notification
  • May 15, 2015: deadline for system submit
  • May 31 - June 4, 2015: The Challenge takes place at ESWC-15     

Motivation and Objectives

The vision of the Semantic Web (SW) is to populate the Web with machine understandable data so as to make intelligent agents able to automatically interpret its content - just like humans do by inspecting Web content - and assist users in performing a significant number of tasks, relieving them of cognitive overload. The Linked Data movement kicked-off the vision by realising a key bootstrap in publishing machine understandable information mainly taken from structured data (typically databases) or semi-structured data (e.g. Wikipedia infoboxes). However, most of the Web content consists of natural language text, e.g., Web sites, news, blogs, micro-posts, etc., hence a main challenge is to extract as much relevant knowledge as possible from this content, and publish it in the form of Semantic Web triples.

There is huge work on knowledge extraction (KE) and knowledge discovery contributing to address this problem, however most of the evaluations are focused on linking extracted facts and entities to concepts already existing on available Knowledge Bases (KB). 

The Open Knowledge Extraction Challenge focuses on the production of new knowledge aimed at either populating and enriching existing knowledge bases or creating new ones. This means that the defined tasks focus on extracting concepts, individuals, properties, and statements that not necessarily exist already in a target knowledge base, and on representing them according to Semantic Web standard in order to be directly injected in linked datasets and their ontologies. 

This is in line with available efforts in the community (e.g. http://aksw.org/Projects/GERBIL.html) to uniform results of existing KE methods to make them directly reusable for populating the SW.

In this direction, the proposed tasks will be structured following a common formalisation, the required output will be in a standard SW format (specifically the Natural Language Interchange (NIF) format will be required for all tasks) and the evaluation procedure will be publicly available in a standard evaluation framework.

The OKE challenge, has the ambition to advance a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction) by taking into account specific SW requirements.

Target Audience

The Challenge is open to everyone from industry and academia.

We expect to trigger attention from the Knowledge Extraction community and foster their broader integration with the Semantic Web community.

Tasks

The OKE Challenge is defined in terms of three different tasks. Each system can participate to each task individually.

Task 1: Named Entity Resolution, Linking and Typing for Knowledge Base population

This task consists of (i) identifying Named Entities in a sentence and create an OWL individual (owl:Individual statement) representing it, (ii) assigning a type to such individual (rdf:type statement) selected from a set of given types (the given types will be a subset of a popular KB, e.g. DBpedia, and will be given by the organisers) and (iii) link (owl:sameAs statement) such individual, when possible, to a reference KB (which will be stated by the organisers, e.g. DBpedia).

Task 2: Class Induction and entity typing for Vocabulary and Knowledge Base enrichment

This task consist in producing rdf:type statements, given definition texts. The participants will be given a dataset of sentences, each defining an entity (known a priori), e.g. the entity: “dpedia:Skara_Cathedral”, and its definition: “Skara Cathedral is a church in the Swedish city of Skara.”. 

Participants are expected to (i) identify the type(s) of the given entity as they are expressed in the given definition, (ii) create a owl:Class statement for defining each of them as a new class in the target knowledge base, (iii) create a rdf:type statement between the given entity and the new created classes, and (iv) align the identified types, if a correct alignment is available, to a set of given types (the given types will be a subset of a popular KB, e.g. DBpedia and will be given by the organisers).

Task 3: Relation extraction and naming, and triple generation for Ontology and Knwoledge Base enrichment

The participants will be given as input a sentence and two entities contained in the sentence. The task consists in (i) assessing whether the sentence contains an evidence of a relation between the two input entities and if true (ii) the creation of a OWL property representing the relation, including a value for its rdf:label annotation statement, and (iii) the production of a statement for the relation.

The triple must be of the form <entity1> <relation> <entity2>; where: 

a. <entity1>, <entity2> are the input URIs, i.e., the given pair of entities as subject and object of the statement 

b. <relation> is the learnt OWL property as predicate. 

The URI for the predicate must be created by the participants; we will not require the linking with a reference KB, but we will provide a formalism to produce the URI for the relation and use string similarity measure to assess the results against a Gold Standard.

Evaluation Dataset

Systems will be evaluated against a testing dataset for each task which will be released after a first-round of evaluation during the Conference. Participants are recommended to train and/or test their own systems using the training dataset available on the Challenge website (https://github.com/anuzzolese/oke-challenge) starting from February 16th. Precision, recall, F1-measure for all the tasks will be computed automatically by using a state of the art benchmarking tools, such as GERBIL. When necessary (e.g task 3) an adapted evaluation will be added to the benchmark tool to include string similarity within the evaluation.

Subjective Evaluation

A subjective evaluation will be performed by the members of the Advisory Board. For each system, reviewers will asses the methodology, the technical soundness and the innovativeness of the system.

Objective Evaluation

Systems will be evaluated in terms of standard precision, recall and F-measure. The evaluation will be performed by using a state of the art benchmarking tools, such as GERBIL.

Judging and Prizes

We propose to award systems based on two criteria, judged separately:

    Subjective: the paper describing the system will be assessed by the reviewers.

    Objective: the system with the highest scores in the evaluation benchmark.

All papers passing the subjective evaluation will be competing for the objective evaluation, and will be published in the challenge proceedings. A number of **finalists** systems, considering results both from subjective and objective evaluation, will have to present their work in a conference dedicated session. The exact number of finalists and the presentation style will depend on the Conference policy. 

How to Participate

The following information has to be provided:

* Abstract: no more than 200 words.

* Description: It should contain the details of the system, including why the system is innovative, how it uses Semantic Web, which features or functions the system provides, what design choices were made and what lessons were learned. The description should also summarise how participants have addressed the evaluation tasks. Papers must be submitted in PDF format, following the style of the Springer's Lecture Notes in Computer Science (LNCS) series (http://www.springer.com/computer/lncs/lncs+authors), and not exceeding 12 pages in length.

* Web Access: The application can either be accessible via the web or downloadable. If the application is not publicly accessible, password must be provided. A short set of instructions on how to use the application should be provided as well.

Papers are submitted in PDF format via the challenge's EasyChair submission pages https://easychair.org/conferences/?conf=oke2015

Mailing List

A mailing list dedicated to the challenge will be available to all participants in order to allow them share comments and questions and benefit from receiving the latest news and from the organisers’ support. 

 

Additional information can be found at https://github.com/anuzzolese/oke-challenge