The task of recognizing (cf. Named Entity Recognition ) and disambiguating named entities to a knowledge base (e.g. Wikidata, DBpedia, or YAGO). It is sometimes also referred to as Named Entity Recognition and Disambiguation.
EL can be split into two classes of approaches:
Raiman is the current SOTA in Cross-lingual Entity Linking for WikiDisamb30 and TAC KBP 2010 datasets (note: Mulang’ et al. 2020 is the current SOTA for the CoNLL-AIDA dataset). They construct a type system and use it to constrain neural network outputs to respect the symbolic structure. Their approach involves a 2-step algorithm:
DeepType is applied to three standard datasets (WikiDisamb30, CoNLL (YAGO), TAC KBP 2010) and outperforms existing solutions, even those using deep learning-based entity embeddings.
The AIDA CoNLL-YAGO Dataset by Hoffart contains entity assignments to mentions annotated for the CoNLL 2003 NER task. Entities are identified by the YAGO2 entity identifier, Wikipedia URL , or Freebase mid .
Paper / Source | Micro-Precision | Macro-Precision | Paper / Source | Code |
---|---|---|---|---|
Mulang’ et al. (2020) | 94.94 | - | Evaluating the Impact of Knowledge Graph Context | - |
Raiman et al. (2018) | 94.88 | - | DeepType: Multilingual Entity Linking | Official |
Sil et al. (2018) | 94.0 | - | Neural Cross-Lingual Entity Linking | - |
Radhakrishnan et al. (2018) | 93.0 | 93.7 | ELDEN: Improved Entity Linking | - |
Le et al. (2018) | 93.07 | - | Improving Entity Linking | Official |
Ganea and Hofmann (2017) | 92.22 | - | Deep Joint Entity Disambiguation | Link |
Hoffart et al. (2011) | 82.29 | 82.02 | Robust Disambiguation of Named Entities | - |
Paper / Source | Micro-F1 (strong) | Macro-F1 (strong) | Paper / Source | Code |
---|---|---|---|---|
van Hulst et al. (2020) | 83.3 | 81.3 | REL: An Entity Linker | Official |
Kolitsas et al. (2018) | 82.6 | 82.4 | End-to-End Neural Entity Linking | Official |
Kannan Ravi et al. (2021) | 83.1 | - | CHOLAN: A Modular Approach | Official |
Piccinno et al. (2014) | 70.8 | 73.0 | From TagME to WAT | - |
Hoffart et al. (2011) | 71.9 | 72.8 | Robust Disambiguation of Named Entities | - |
The Knowledge Base Population (KBP) Track at TAC 2010 explores extracting information about entities with reference to an external knowledge source. You can download the dataset from LDC or here .
Paper / Source | Micro-Precision | Macro-Precision | Paper / Source | Code |
---|---|---|---|---|
Raiman et al. (2018) | 90.85 | - | DeepType | Official |
Sil et al. (2018) | 87.4 | - | Neural Cross-Lingual Entity Linking | - |
Yamada et al. (2016) | 85.2 | - | Joint Learning of Embedding | - |
Evaluating Entity Linking systems can be complex due to the subjective nature of what constitutes a “correct” annotation. For example, annotating “Tom Waits” with a URL redirect can technically be correct, but it might cause confusion in evaluation. Additionally, differences in evaluation corpora (e.g., news content vs. Tweets) make comparing EL systems difficult.
GERBIL , developed by AKSW , is a benchmarking framework that standardizes evaluations for EL systems and addresses many of these issues.