Robust Entity Resolution at Scale via Graph Neural Blocking and Incremental Cluster Consolidation

Main Article Content

Aarav Gautam
Prabin Chaulagain

Abstract

Entity resolution links records that refer to the same real-world entity across noisy, heterogeneous, and frequently changing data sources. Contemporary pipelines typically separate candidate generation from downstream clustering, yet the boundary between these steps is increasingly strained by scale, concept drift, and the need to preserve recall under weak supervision. This paper develops a robust, scalable entity resolution framework that couples graph neural blocking with incremental cluster consolidation. The blocking component represents records and attribute evidence as a heterogeneous graph and learns retrieval-oriented embeddings via message passing, enabling adaptive candidate generation that is resilient to missingness, schema variation, and long-tail lexical patterns. The consolidation component maintains clusters incrementally under streaming arrivals and late updates, using calibrated edge evidence and cluster-level sufficient statistics to control error propagation while supporting fast merges. The overall design targets the practical regime where the number of records is large enough that quadratic comparison is infeasible, labels are sparse or delayed, and operational constraints require bounded latency for updates. Technical contributions include a formalization of blocking as constrained graph retrieval, a consolidation objective connected to correlation clustering with incremental approximations, and a distributed implementation strategy that decouples embedding inference from cluster state while preserving transitivity guarantees. Empirical analysis is presented through stress-tested scenarios and scaling considerations that emphasize robustness to adversarial attribute corruption, skewed frequency distributions, and evolving entity profiles, highlighting trade-offs among recall, precision, and computational cost.

Article Details

Section

Articles

How to Cite

Robust Entity Resolution at Scale via Graph Neural Blocking and Incremental Cluster Consolidation. (2023). Transactions on Embedded Systems, Real-Time Computing, and Applications, 13(5), 1-22. https://sciencequill.com/index.php/TESRTCA/article/view/2023-05-01