Neural Extraction Framework logo Neural Extraction Framework

This week, I didn’t work much but performed the following tasks and observed the following things :

Entity mapping

Mapped URIs/URLs of articles to their labels(which are similar to the the article’s title) There are 2 such properties dbp:name and rdf:labels. Since I wasn’t able to see the property dbp:name on every article’s page in the dbpedia database, so went with rdf:labels instead.

PREFIX  rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
 SELECT ?title
 WHERE { <http://dbpedia.org/resource/Berlin_Wall>
        rdfs:label ?title .
FILTER (lang(?title) = 'en')

Berlin Wall dbp:name property Fig1: Berlin Wall dbp:name property

Berlin Wall rdf:label property Fig2: Berlin Wall rdf:label property

Python Programming language dbp:p property Fig3: Python Programming language dbp:p property

Python Programming language rdf:label property Fig4: Python Programming language rdf:label property

But there were some articles which had very few or no properties.

No property about title/label/name of article Fig5: No property about title/label/name of article

No property about title/label/name of article Fig6: No property about title/label/name of article

Entity recognition and extraction

In our case, hyperlinks act as entities only, as stated in the goal of the problem statement: predicate resolution of wiki links among entities.

However, after scraping all the text of the article Berlin Wall, I used spaCy library to find the entities. I used to believe that in most cases named entity will be a hyperlink in the article but it was not fully true. It was found that many entities that were recognised by spaCy lib were also the hyperlinks, which we found by running the SPARQL query using dbo:wikiPageWikiLink, in the scraped text.

I extracted entities by 2 methods:

August didn’t come when we ran the SPARQL query but spaCy lib recogonised it as an entity.

SPARQL Query dbo:wikiPageWikiLink hyperlinks present in wikipedia article Fig7: SPARQL Query dbo:wikiPageWikiLink hyperlinks present in wikipedia article

SpaCy library for entity extraction and recognition Fig8: SpaCy library for entity extraction and recognition

I think that the entities which are not returned by the SPARQL query hold relatively less significance since we have to identify the relation between the articles(which are entities in our case)

Today I am looking into coreference resolution.