{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:57:37Z","timestamp":1781326657636,"version":"3.54.1"},"reference-count":89,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2026,9,23]],"date-time":"2026-09-23T00:00:00Z","timestamp":1790121600000},"content-version":"vor","delay-in-days":366,"URL":"https:\/\/2.zoppoz.workers.dev:443\/http\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62025206 and U23A20296"],"award-info":[{"award-number":["62025206 and U23A20296"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Department of Science and Technology of Zhejiang Province, China","award":["2024C01259"],"award-info":[{"award-number":["2024C01259"]}]},{"DOI":"10.13039\/501100007928","name":"Ningbo Municipal Bureau of Science and Technology","doi-asserted-by":"publisher","award":["2022A-237-G"],"award-info":[{"award-number":["2022A-237-G"]}],"id":[{"id":"10.13039\/501100007928","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Foundation","doi-asserted-by":"crossref","award":["NNF22OC0072415"],"award-info":[{"award-number":["NNF22OC0072415"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1952247, 2245372, 2420846, 2133391, 2008993, 2008993, 1952247, and 2032525"],"award-info":[{"award-number":["1952247, 2245372, 2420846, 2133391, 2008993, 2008993, 1952247, and 2032525"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,9,22]]},"abstract":"<jats:p>\n                    Entity Resolution (ER) is a fundamental data quality improvement task that identifies and links records referring to the same real-world entity. Traditional ER approaches often rely on pairwise comparisons, which can be costly regarding both time and monetary resources, especially when large datasets are involved. Recently, Large Language Models (LLMs) have demonstrated promising results in ER tasks. Still, existing methods typically focus on pairwise matching, missing the potential of LLMs to directly perform clustering in a more cost-effective and scalable manner. In this paper, we propose a novel\n                    <jats:italic toggle=\"yes\">in-context clustering<\/jats:italic>\n                    approach for ER, where LLMs are used to cluster records directly, reducing both time complexity and monetary costs. We systematically investigate the design space for in-context clustering, analyzing the impact of factors such as set size, diversity, variation, and ordering of records on clustering performance. Based on these insights, we develop LLM-CER (LLM-powered Clustering-based ER) that obtains high-quality ER results while minimizing LLM API calls. Our approach addresses key challenges, including efficient cluster merging and LLM's hallucination, providing a scalable and effective solution for ER. Extensive experiments on nine real-world datasets demonstrate that our method significantly improves result quality, achieving up to 150% higher accuracy, 10% increase in the FP-measure, and reducing API calls by up to 5X, while maintaining a comparable monetary cost to the most cost-effective baseline.\n                  <\/jats:p>","DOI":"10.1145\/3749170","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T17:17:03Z","timestamp":1758647823000},"page":"1-28","source":"Crossref","is-referenced-by-count":4,"title":["In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0009-0007-5930-4606","authenticated-orcid":false,"given":"Jiajie","family":"Fu","sequence":"first","affiliation":[{"name":"Zhejiang University, Ningbo, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0009-0009-0675-4317","authenticated-orcid":false,"given":"Haitong","family":"Tang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Ningbo, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-7312-6312","authenticated-orcid":false,"given":"Arijit","family":"Khan","sequence":"additional","affiliation":[{"name":"Bowling Green State University, Bowling Green, USA and Aalborg University, Denmark, Denmark"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0003-1667-5435","authenticated-orcid":false,"given":"Sharad","family":"Mehrotra","sequence":"additional","affiliation":[{"name":"University of California, Irvine, Irvine, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-8082-7398","authenticated-orcid":false,"given":"Xiangyu","family":"Ke","sequence":"additional","affiliation":[{"name":"Zhejiang University, Ningbo &amp; Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0003-3816-8450","authenticated-orcid":false,"given":"Yunjun","family":"Gao","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,23]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Retrieved","year":"2024","unstructured":"2024. API Pricing of Open-AI. Retrieved December 16, 2024 from https:\/\/2.zoppoz.workers.dev:443\/https\/openai.com\/api\/pricing\/"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TrustCom.2012.83"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3631504.3631518"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-008-0098-x"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39-48","author":"Bilenko Mikhail","unstructured":"Mikhail Bilenko and Raymond J. Mooney. 2003. Adaptive duplicate detection using learnable string similarity measures. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 39-48."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2001.914854"},{"key":"e_1_2_1_7_1","first-page":"21","article-title":"On the resemblance and containment of documents","author":"Broder A.","year":"1997","unstructured":"A. Broder. 1997. On the resemblance and containment of documents. In Compression and Complexity of (SEQUENCES). 21-29.","journal-title":"Compression and Complexity of (SEQUENCES)."},{"key":"e_1_2_1_8_1","first-page":"1877","article-title":"Language models are few-shot learners","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems. 1877-1901.","journal-title":"Advances in Neural Information Processing Systems."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915252"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1077501.1077512"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1255175.1255215"},{"key":"e_1_2_1_12_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Chen Zhikai","year":"2024","unstructured":"Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, and Jiliang Tang. 2024. Label-free node classification on graphs with large language models (llms). In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3418896"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3418896"},{"key":"e_1_2_1_16_1","volume-title":"Maximum 3-dimensional matching. A Compendium of NP Optimization Problems","author":"Crescenzi Pierluigi","year":"2000","unstructured":"Pierluigi Crescenzi, Viggo Kann, Magnus Halldorsson, Marek Karpinski, and Gerhard Woeginger. 2000. Maximum 3-dimensional matching. A Compendium of NP Optimization Problems (2000)."},{"key":"e_1_2_1_17_1","volume-title":"Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava.","author":"Crescenzi Valter","year":"2021","unstructured":"Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021. Alaska: A flexible benchmark for data integration tasks. arXiv Preprint arXiv:2101.11259 (2021)."},{"key":"e_1_2_1_18_1","first-page":"5","article-title":"Introduction to the k-means clustering algorithm based on the elbow method","volume":"1","author":"Cui Mengyao","year":"2020","unstructured":"Mengyao Cui. 2020. Introduction to the k-means clustering algorithm based on the elbow method. Accounting, Auditing and Finance, Vol. 1, 1 (2020), 5-8.","journal-title":"Accounting, Auditing and Finance"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035960"},{"key":"e_1_2_1_20_1","volume-title":"SIGMOD 2020 programming contest official website. https:\/\/2.zoppoz.workers.dev:443\/http\/www.inf.uniroma3.it\/db\/sigmod2020contest Retrieved","author":"Research Database","year":"2024","unstructured":"Database Research Group of the Roma Tre University. 2020. SIGMOD 2020 programming contest official website. https:\/\/2.zoppoz.workers.dev:443\/http\/www.inf.uniroma3.it\/db\/sigmod2020contest Retrieved September 22, 2024 from"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187900"},{"key":"e_1_2_1_22_1","volume-title":"Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171-4186","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171-4186."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3269461"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/1191547.1191739"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.36934\/TR2024_197"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00284"},{"key":"e_1_2_1_27_1","volume-title":"In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration. arXiv preprint arXiv:2506.02509","author":"Fu Jiajie","year":"2025","unstructured":"Jiajie Fu, Haitong Tang, Arijit Khan, Sharad Mehrotra, Xiangyu Ke, and Yunjun Gao. 2025a. In-context Clustering-based Entity Resolution with Large Language Models: A Design Space Exploration. arXiv preprint arXiv:2506.02509 (2025)."},{"key":"e_1_2_1_28_1","unstructured":"Jiajie Fu Haitong Tang Arijit Khan Sharad Mehrotra Xiangyu Ke and Yunjun Gao. 2025b. Source code and datasets. https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/ZJU-DAILY\/LLMCER"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00656-7"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101565"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367564"},{"key":"e_1_2_1_32_1","first-page":"518","volume-title":"Proceedings of the 25th International Conference on Very Large Data Bases","volume":"99","author":"Gionis Aristides","year":"1999","unstructured":"Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al., 1999. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, Vol. 99. 518-529."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588576"},{"key":"e_1_2_1_34_1","volume-title":"Bull. EATCS","volume":"111","author":"Gruenheid Anja","year":"2013","unstructured":"Anja Gruenheid, Donald Kossmann, and Besmira Nushi. 2013. When is A=B? Bull. EATCS, Vol. 111 (2013)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00615"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687771"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3314121"},{"key":"e_1_2_1_38_1","volume-title":"Entity resolution with small-scale LLMs: A study on prompting strategies and hardware limitations. Master's thesis","author":"Ioannis Arvanitis Kasinikos","unstructured":"Arvanitis Kasinikos Ioannis. 2024. Entity resolution with small-scale LLMs: A study on prompting strategies and hardware limitations. Master's thesis. National and Kapodistrian University of Athens."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 847-860","author":"Jeffery Shawn R.","unstructured":"Shawn R. Jeffery, Michael J. Franklin, and Alon Y. Halevy. 2008. Pay-as-you-go user feedback for dataspace systems. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 847-860."},{"key":"e_1_2_1_40_1","first-page":"22199","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","volume":"35","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. Proceedings of the 36th International Conference on Neural Information Processing Systems, Vol. 35 (2022), 22199-22213."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994535"},{"key":"e_1_2_1_42_1","first-page":"34","volume-title":"Nature","volume":"234","author":"Levandowsky Michael","year":"1971","unstructured":"Michael Levandowsky and David Winter. 1971. Distance between sets. Nature, Vol. 234, 5323 (1971), 34-35."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17562"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17562"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589335.3651245"},{"key":"e_1_2_1_46_1","volume-title":"Xiang Yue, and Wenhu Chen.","author":"Li Tianle","year":"2024","unstructured":"Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, and Wenhu Chen. 2024b. Long-context llms struggle with long in-context learning. CoRR (2024)."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-023-00779-z"},{"key":"e_1_2_1_48_1","first-page":"857","article-title":"Self-supervised learning: Generative or contrastive","volume":"35","author":"Liu Xiao","year":"2021","unstructured":"Xiao Liu, Fanjin Zhang, Zhenyu Hou, Li Mian, Zhaoyu Wang, Jing Zhang, and Jie Tang. 2021. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 1 (2021), 857-876.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_49_1","volume-title":"RoBERTa: A robustly optimized BERT pretraining approach. CoRR","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR, Vol. abs\/1907.11692 (2019)."},{"key":"e_1_2_1_50_1","first-page":"34","article-title":"Improving RAG systems via sentence clustering and reordering","volume":"3784","author":"Alessio","year":"2024","unstructured":"Alessio M., Faggioli G., Ferro N., Nardini F. M., and Perego R., 2024. Improving RAG systems via sentence clustering and reordering. In CEUR WORKSHOP PROCEEDINGS, vol. 3784, pp. 34-43. Washington DC, USA, 07\/07\/2024. CEUR-WS, 34-43.","journal-title":"CEUR WORKSHOP PROCEEDINGS"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/347090.347123"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380597"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s44163-024-00159-8"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574258"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/1841211"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3358018"},{"key":"e_1_2_1_58_1","unstructured":"Konstantinos Nikoletos George Papadakis and Manolis Koubarakis. 2022. pyJedAI: a Lightsaber for Link Discovery. In ISWC (Posters\/Demos\/Industry)."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01878-7"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377455"},{"key":"e_1_2_1_61_1","volume-title":"ACM Comput. Surv.","volume":"53","author":"Papadakis George","year":"2020","unstructured":"George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020b. Blocking and filtering techniques for entity resolution: A survey. ACM Comput. Surv., Vol. 53, 2, Article 31 (2020), 42 pages."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947624"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467878"},{"key":"e_1_2_1_64_1","first-page":"221","article-title":"Using ChatGPT for entity matching","author":"Peeters Ralph","year":"2023","unstructured":"Ralph Peeters and Christian Bizer. 2023. Using ChatGPT for entity matching. In New Trends in Database and Information Systems. 221-230.","journal-title":"New Trends in Database and Information Systems."},{"key":"e_1_2_1_65_1","volume-title":"International Conference on Extending Database Technology (EDBT). 529-541","author":"Peeters Ralph","year":"2025","unstructured":"Ralph Peeters, Aaron Steiner, and Christian Bizer. 2025. Entity matching using large language models. In International Conference on Extending Database Technology (EDBT). 529-541."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132949"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612012"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3786-3790","author":"Rastaghi Mehdi Akbarian","year":"2022","unstructured":"Mehdi Akbarian Rastaghi, Ehsan Kamalloo, and Davood Rafiei. 2022. Probing the robustness of pre-trained language models for entity matching. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3786-3790."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-66917-5_19"},{"key":"e_1_2_1_71_1","volume-title":"Workshop on Energy Efficient Machine Learning and Cognitive Computing@NeurIPS.","author":"Sanh Victor","year":"2019","unstructured":"Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In Workshop on Energy Efficient Machine Learning and Cognitive Computing@NeurIPS."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.603"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476294"},{"key":"e_1_2_1_74_1","volume-title":"Vassilis N. Ioannidis, Changhe Yuan, and Chandan K. Reddy.","author":"Tipirneni Sindhu","year":"2024","unstructured":"Sindhu Tipirneni, Ravinarayana Adkathimar, Nurendra Choudhary, Gaurush Hiranandani, Rana Ali Amjad, Vassilis N. Ioannidis, Changhe Yuan, and Chandan K. Reddy. 2024. Context-Aware Clustering using Large Language Models. arXiv:2405.00988 [cs.CL] https:\/\/2.zoppoz.workers.dev:443\/https\/arxiv.org\/abs\/2405.00988"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113286"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732982"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00648"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465280"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723739"},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the 31st International Conference on Computational Linguistics. 96-109","author":"Wang Tianshu","year":"2025","unstructured":"Tianshu Wang, Hongyu Lin, Xiaoyang Chen, Xianpei Han, Hao Wang, Zhenyu Zeng, and Le Sun. 2025. Match, compare, or select? An investigation of large language models for entity matching. In Proceedings of the 31st International Conference on Computational Linguistics. 96-109."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-024-02066-y"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536337"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132876"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/17.9.763"},{"key":"e_1_2_1_87_1","volume-title":"Workshops at the International Conference on Very Large Data Bases.","author":"Zhang Haochen","year":"2023","unstructured":"Haochen Zhang, Yuyang Dong, Chuan Xiao, and Masafumi Oyamada. 2023a. Large language models as data preprocessors. In Workshops at the International Conference on Very Large Data Bases."},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.858"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW61823.2024.00044"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3749170","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3749170","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:34:31Z","timestamp":1781325271000},"score":1,"resource":{"primary":{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3749170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,22]]},"references-count":89,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,9,22]]}},"alternative-id":["10.1145\/3749170"],"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3749170","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,22]]}}}