Speakers
Description
Cross-knowledge-graph (KG) learning is hindered because embeddings trained independently occupy incompatible vector spaces, while pre-merging KGs to enforce consistency is computationally infeasible at web scale. We present WHALE-embeddings, a continuously updated resource derived from Web Data Commons (~98B RDF triples across ~22M domains). By partitioning the corpus by website and training subgraphs independently with DECAL on HPC infrastructure, we obtain vectors for ~20.9B IRIs. To make these independently trained spaces interoperable, we introduce NAAS (Neural Adaptive Alignment Space) a model-agnostic, post-hoc alignment framework with two modes: NAASEA for entity alignment and NAASFT for iterative fine-tuning with KvsAll scoring. NAAS aligns local subgraphs and external KGs into a unified space without requiring graph fusion or retraining from scratch. Experiments on entity alignment and link prediction show that NAAS preserves strong downstream performance while enabling cross-KG nearest-neighbor search, disambiguation, and class expression learning. Together, WHALE-embeddings and NAAS provide a scalable path toward web scale, cross-domain representation learning and make the largest public KGE resource immediately usable across graphs.