Entity Linking | Blog - Apply Creatures

Entity Linking

2024/07/24

624 words

3 mins

The task of recognizing (cf. Named Entity Recognition ) and disambiguating named entities to a knowledge base (e.g. Wikidata, DBpedia, or YAGO). It is sometimes also referred to as Named Entity Recognition and Disambiguation.

EL can be split into two classes of approaches:

End-to-End: Processing a piece of text to extract the entities (i.e., Named Entity Recognition) and then disambiguate these entities to the correct entry in a knowledge base (e.g. Wikidata, DBpedia, YAGO).
Disambiguation-Only: Taking gold-standard named entities as input and disambiguating them to the correct entry in a given knowledge base.

Current State Of The Art (SOTA)

Raiman is the current SOTA in Cross-lingual Entity Linking for WikiDisamb30 and TAC KBP 2010 datasets (note: Mulang’ et al. 2020 is the current SOTA for the CoNLL-AIDA dataset). They construct a type system and use it to constrain neural network outputs to respect the symbolic structure. Their approach involves a 2-step algorithm:

Heuristic search or stochastic optimization over discrete variables to define a type system.
Gradient descent to fit classifier parameters.

DeepType is applied to three standard datasets (WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and outperforms existing solutions, even those using deep learning-based entity embeddings.

Evaluation

Metrics

Disambiguation-Only Approach

Micro-Precision: Fraction of correctly disambiguated named entities in the full corpus.
Macro-Precision: Fraction of correctly disambiguated named entities, averaged by document.

End-to-End Approach

Gerbil Micro-F1 (strong matching): Micro InKB F1 score for correctly linked and disambiguated mentions in the full corpus, as computed using the Gerbil platform.
Gerbil Macro-F1 (strong matching): Macro InKB F1 score for correctly linked and disambiguated mentions in the full corpus, as computed using the Gerbil platform.

Datasets

AIDA CoNLL-YAGO Dataset

The AIDA CoNLL-YAGO Dataset by Hoffart contains entity assignments to mentions annotated for the CoNLL 2003 NER task. Entities are identified by the YAGO2 entity identifier, Wikipedia URL , or Freebase mid .

Disambiguation-Only Models

Paper / Source	Micro-Precision	Macro-Precision	Paper / Source	Code
Mulang’ et al. (2020)	94.94	-	Evaluating the Impact of Knowledge Graph Context	-
Raiman et al. (2018)	94.88	-	DeepType: Multilingual Entity Linking	Official
Sil et al. (2018)	94.0	-	Neural Cross-Lingual Entity Linking	-
Radhakrishnan et al. (2018)	93.0	93.7	ELDEN: Improved Entity Linking	-
Le et al. (2018)	93.07	-	Improving Entity Linking	Official
Ganea and Hofmann (2017)	92.22	-	Deep Joint Entity Disambiguation	Link
Hoffart et al. (2011)	82.29	82.02	Robust Disambiguation of Named Entities	-

End-to-End Models

Paper / Source	Micro-F1 (strong)	Macro-F1 (strong)	Paper / Source	Code
van Hulst et al. (2020)	83.3	81.3	REL: An Entity Linker	Official
Kolitsas et al. (2018)	82.6	82.4	End-to-End Neural Entity Linking	Official
Kannan Ravi et al. (2021)	83.1	-	CHOLAN: A Modular Approach	Official
Piccinno et al. (2014)	70.8	73.0	From TagME to WAT	-
Hoffart et al. (2011)	71.9	72.8	Robust Disambiguation of Named Entities	-

TAC KBP English Entity Linking Comprehensive and Evaluation Data 2010

The Knowledge Base Population (KBP) Track at TAC 2010 explores extracting information about entities with reference to an external knowledge source. You can download the dataset from LDC or here .

Disambiguation-Only Models (TAC KBP 2010)

Paper / Source	Micro-Precision	Macro-Precision	Paper / Source	Code
Raiman et al. (2018)	90.85	-	DeepType	Official
Sil et al. (2018)	87.4	-	Neural Cross-Lingual Entity Linking	-
Yamada et al. (2016)	85.2	-	Joint Learning of Embedding	-

Platforms

Evaluating Entity Linking systems can be complex due to the subjective nature of what constitutes a “correct” annotation. For example, annotating “Tom Waits” with a URL redirect can technically be correct, but it might cause confusion in evaluation. Additionally, differences in evaluation corpora (e.g., news content vs. Tweets) make comparing EL systems difficult.

GERBIL , developed by AKSW , is a benchmarking framework that standardizes evaluations for EL systems and addresses many of these issues.

#Nlp #Machine Learning #Engineering #WikiDisamb30 #CoNLL-AIDA