Robust Entity Linking and Disambiguation in Noisy, Automatically Extracted Knowledge Graphs
Main Article Content
Abstract
This paper explores robust entity linking and disambiguation in automatically extracted and often noisy knowledge graphs, emphasizing strategies that integrate context, linguistic features, and structural graph information. The principal aim is to devise a framework capable of interpreting entity mentions in heterogeneous text corpora, ensuring accurate alignment with canonical entities within large-scale databases. Central to our approach is the reconciliation of diverse representations that frequently arise due to morphological variations, typographical inconsistencies, or incomplete metadata. We propose a multi-step pipeline, focusing first on candidate generation using approximate lexical matching and local embedding-based retrieval, then refining disambiguation through a probabilistic scoring function that leverages context-specific signals. Additionally, we explore incorporation of adjacency-based constraints and global consistency checks to mitigate error propagation, a common phenomenon in aggregated knowledge graph construction. We demonstrate how graph embeddings, extracted through geometric or translational methods, can provide robust prior knowledge, guiding the alignment of ambiguous references to their underlying canonical forms. Extensive evaluation on benchmark data highlights performance gains across various precision and recall metrics, while ablation studies reveal the importance of combining lexical, semantic, and structural cues. This research presents a cohesive methodological framework, offering insights into the technical nuances and emerging challenges in entity resolution pipelines.