LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

Wang, Yihan; Liu, Peiyu

Computer Science > Computation and Language

arXiv:2503.18596 (cs)

[Submitted on 24 Mar 2025 (v1), last revised 14 Jun 2025 (this version, v3)]

Title:LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

Authors:Yihan Wang, Peiyu Liu

View PDF HTML (experimental)

Abstract:Schema linking is a critical bottleneck in applying existing Text-to-SQL models to real-world, large-scale, multi-database environments. Through error analysis, we identify two major challenges in schema linking: (1) Database Retrieval: accurately selecting the target database from a large schema pool, while effectively filtering out irrelevant ones; and (2) Schema Item Grounding: precisely identifying the relevant tables and columns within complex and often redundant schemas for SQL generation. Based on these, we introduce LinkAlign, a novel framework tailored for large-scale databases with thousands of fields. LinkAlign comprises three key steps: multi-round semantic enhanced retrieval and irrelevant information isolation for Challenge 1, and schema extraction enhancement for Challenge 2. Each stage supports both Agent and Pipeline execution modes, enabling balancing efficiency and performance via modular design. To enable more realistic evaluation, we construct AmbiDB, a synthetic dataset designed to reflect the ambiguity of real-world schema linking. Experiments on widely-used Text-to-SQL benchmarks demonstrate that LinkAlign consistently outperforms existing baselines on all schema linking metrics. Notably, it improves the overall Text-to-SQL pipeline and achieves a new state-of-the-art score of 33.09% on the Spider 2.0-Lite benchmark using only open-source LLMs, ranking first on the leaderboard at the time of submission. The codes are available at this https URL

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2503.18596 [cs.CL]
	(or arXiv:2503.18596v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.18596

Submission history

From: Yihan Wang [view email]
[v1] Mon, 24 Mar 2025 11:53:06 UTC (978 KB)
[v2] Tue, 25 Mar 2025 11:04:18 UTC (978 KB)
[v3] Sat, 14 Jun 2025 14:19:19 UTC (855 KB)

Computer Science > Computation and Language

Title:LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators