Every Wikipedia article links to a number of other articles. In DBpedia, we keep track of these links through the dbo:wikiPageWikiLink property. Thanks to them, we know that the :Berlin_Wall entity is semantically connected to 299 base entities.

This project is funded by Google Summer of Code 2022.


However, only 9 out of 299 base entities are linked from :Berlin_Wall via also another predicate. This suggests that in the large majority of cases, it is not clear what kind of relationship exists between the entities. In other words, DBpedia does not know what specific RDF predicate links the subject (in our case, :Berlin_Wall) to any of the objects above. Currently, such relationships are extracted from tables and the infobox (usually found top right of a Wikipedia article) via the Extraction Framework. Instead of extracting RDF triples from semi-structured data only, we want to leverage information found in the entirety of a Wikipedia article, including page text.


The goal of this project is to develop a framework for predicate resolution of wiki links among entities. I choose to focus on a specific kind of relationship:

Expected result: :Peaceful_Revolution –––dbo:effect––> :German_reunification

Expected result: :2010_FIFA_Ballon_d’Or –––dbo:recipient––> :Lionel_Messi


This project will potentially generate millions of new statements. This new information could be released by DBpedia to the public as part of a new dataset. The creation of a neural extraction framework could introduce the use of robust parsers for a more accurate extraction of Wikipedia content.

This project started in 2021 and is currently in its 2nd iteration in GSoC.


Tommaso Soru, Diego Moussallem, Ziwei Xu