preview image
> NLP
Entity Linking
2024/07/24
624 words
3 mins

 

The task of recognizing (cf. Named Entity Recognition ) and disambiguating named entities to a knowledge base (e.g. Wikidata, DBpedia, or YAGO). It is sometimes also referred to as Named Entity Recognition and Disambiguation.

EL can be split into two classes of approaches:

  • End-to-End: Processing a piece of text to extract the entities (i.e., Named Entity Recognition) and then disambiguate these entities to the correct entry in a knowledge base (e.g. Wikidata, DBpedia, YAGO).
  • Disambiguation-Only: Taking gold-standard named entities as input and disambiguating them to the correct entry in a given knowledge base.

Current State Of The Art (SOTA)

Raiman is the current SOTA in Cross-lingual Entity Linking for WikiDisamb30 and TAC KBP 2010 datasets (note: Mulang’ et al. 2020 is the current SOTA for the CoNLL-AIDA dataset). They construct a type system and use it to constrain neural network outputs to respect the symbolic structure. Their approach involves a 2-step algorithm:

  1. Heuristic search or stochastic optimization over discrete variables to define a type system.
  2. Gradient descent to fit classifier parameters.

DeepType is applied to three standard datasets (WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and outperforms existing solutions, even those using deep learning-based entity embeddings.

Evaluation

Metrics

Disambiguation-Only Approach
  • Micro-Precision: Fraction of correctly disambiguated named entities in the full corpus.
  • Macro-Precision: Fraction of correctly disambiguated named entities, averaged by document.
End-to-End Approach
  • Gerbil Micro-F1 (strong matching): Micro InKB F1 score for correctly linked and disambiguated mentions in the full corpus, as computed using the Gerbil platform.
  • Gerbil Macro-F1 (strong matching): Macro InKB F1 score for correctly linked and disambiguated mentions in the full corpus, as computed using the Gerbil platform.

Datasets

AIDA CoNLL-YAGO Dataset

The AIDA CoNLL-YAGO Dataset by Hoffart contains entity assignments to mentions annotated for the CoNLL 2003 NER task. Entities are identified by the YAGO2 entity identifier, Wikipedia URL , or Freebase mid .

Disambiguation-Only Models
Paper / SourceMicro-PrecisionMacro-PrecisionPaper / SourceCode
Mulang’ et al. (2020)94.94-Evaluating the Impact of Knowledge Graph Context-
Raiman et al. (2018)94.88-DeepType: Multilingual Entity LinkingOfficial
Sil et al. (2018)94.0-Neural Cross-Lingual Entity Linking-
Radhakrishnan et al. (2018)93.093.7ELDEN: Improved Entity Linking-
Le et al. (2018)93.07-Improving Entity LinkingOfficial
Ganea and Hofmann (2017)92.22-Deep Joint Entity DisambiguationLink
Hoffart et al. (2011)82.2982.02Robust Disambiguation of Named Entities-
End-to-End Models
Paper / SourceMicro-F1 (strong)Macro-F1 (strong)Paper / SourceCode
van Hulst et al. (2020)83.381.3REL: An Entity LinkerOfficial
Kolitsas et al. (2018)82.682.4End-to-End Neural Entity LinkingOfficial
Kannan Ravi et al. (2021)83.1-CHOLAN: A Modular ApproachOfficial
Piccinno et al. (2014)70.873.0From TagME to WAT-
Hoffart et al. (2011)71.972.8Robust Disambiguation of Named Entities-
TAC KBP English Entity Linking Comprehensive and Evaluation Data 2010

The Knowledge Base Population (KBP) Track at TAC 2010 explores extracting information about entities with reference to an external knowledge source. You can download the dataset from LDC or here .

Disambiguation-Only Models (TAC KBP 2010)

Paper / SourceMicro-PrecisionMacro-PrecisionPaper / SourceCode
Raiman et al. (2018)90.85-DeepTypeOfficial
Sil et al. (2018)87.4-Neural Cross-Lingual Entity Linking-
Yamada et al. (2016)85.2-Joint Learning of Embedding-

Platforms

Evaluating Entity Linking systems can be complex due to the subjective nature of what constitutes a “correct” annotation. For example, annotating “Tom Waits” with a URL redirect can technically be correct, but it might cause confusion in evaluation. Additionally, differences in evaluation corpora (e.g., news content vs. Tweets) make comparing EL systems difficult.

GERBIL , developed by AKSW , is a benchmarking framework that standardizes evaluations for EL systems and addresses many of these issues.