Neural Extraction Framework logo Neural Extraction Framework

For relation extraction, I’ve modified my strategy. I was able to realise this after repeatedly reading the problem statement.

As of now the concise problem statement that stands in front of me is predicate resolution of wiki links among entities.

By the middle of July, I want to complete the following strategy for my mvp:

Things to keep in mind

No direct context

I think that if we only want to extract the relation between 2 entities, then it is unnecessary to take into account the sentences which don’t involve any context about those entities. Hence, scraping whole text from the wikipidea article and then finding those particular keywords (entity1, entity2) in the corpus is required. But there might be some cases in which sentences don’t have direct reference to the entities. All the wiki-ids itself are generally entities. For e.g.

Indira Gandhi was first women Prime Minister of India. She was also known as the Iron Lady.

These sentences are from Wikipedia article on Indira Gandhi. It is having the following hyperlinks: Prime Minister of India and Iron Lady.

However, in the 2nd sentence it can be seen that there is no direct reference to the entity Indira Gandhi, rather a pronoun is there that refers to her. Here the pronoun She can be replaced with the entity Indira Gandhi. In order to find such indirect contexts in a text, NLP transformers can be used.

2-step relationships

Other thing that I have found is that the links found using the property _dbo:wikiPageWikiLink, not all the have the direct relation. For e.g. This is a sentence from Berlin wall article

After the end of World War II in Europe, what remained of pre-war Germany west of the Oder-Neisse line was divided into four occupation zones (as per the Potsdam Agreement), each one controlled by one of the four occupying Allied powers: United States, the United Kingdom, France and the Soviet Union.

Here the query I mentioned above yielded hyperlink of United States and there are some occurences of United State in the whole article but none is having any relation with Berlin Wall in any form.

However, some indirect relation could be there : United States was a part of Allied powers in WWII and as a consequence of WWII Berlin Wall was built.

Here there is no direct relation of United states with Berlin Wall But there exists a 2-step relation between Berlin Wall and USA via a 3rd entity (WWII).

Entity and URL mapping

One other thing to be kept in account is that the the entity such as Peter Fechter is mapped to article having title Killing of Peter Fechter and that is the URL(http://dbpedia.org/resource/Killing_of_Peter_Fechter) being extracted when running the above-mentioned query. Most entities are directly mapped to the hyperlinks with the same titles as mentioned in the article i.e. there article’s title and their reference in a particular article are the same.

But here some ambiguity is there that : Killing of Peter Fechter is the title of the Wikipidea article and the entity to which it is mapped is Peter Fetcher. Example Fig2

Here the Berlin Wall is known for killing of Peter Fechter is correct semantically but the relation Berlin Wall is death place for killing of Peter Fetcher is not semantically correct. Instead it should be like this: Berlin Wall is death place for Peter Fetcher. There was only one instance of entity Peter Fechter in the whole article of Berlin Wall.

Multiple Relationships

Other thing is that there might exist multiple relations between 2 entities. For e.g.

Kate was born in Delhi. She died in Delhi.

Here there are 2 relation between the entities Kate and Delhi i.e. Birth place and death place.

In Berlin wall article, similar is the case with the 2 entities Peter Fechter and Berlin Wall: 3 relations existed between these 2 entities as was shown in the Fig2.