EmbeddingGemma: Powerful and Lightweight Text Representations

Vera, Henrique Schechter; Dua, Sahil; Zhang, Biao; Salz, Daniel; Mullins, Ryan; Panyam, Sindhu Raghuram; Smoot, Sara; Naim, Iftekhar; Zou, Joe; Chen, Feiyang; Cer, Daniel; Lisak, Alice; Choi, Min; Gonzalez, Lucas; Sanseviero, Omar; Cameron, Glenn; Ballantyne, Ian; Black, Kat; Chen, Kaifeng; Wang, Weiyi; Li, Zhe; Martins, Gus; Lee, Jinhyuk; Sherwood, Mark; Ji, Juyeong; Wu, Renjie; Zheng, Jingxiao; Singh, Jyotinder; Sharma, Abheesht; Sreepat, Divya; Jain, Aashi; Elarabawy, Adham; Co, AJ; Doumanoglou, Andreas; Samari, Babak; Hora, Ben; Potetz, Brian; Kim, Dahun; Alfonseca, Enrique; Moiseev, Fedor; Han, Feng; Gomez, Frank Palma; Ábrego, Gustavo Hernández; Zhang, Hesen; Hui, Hui; Han, Jay; Gill, Karan; Chen, Ke; Chen, Koert; Shanbhogue, Madhuri; Boratko, Michael; Suganthan, Paul; Duddu, Sai Meher Karthik; Mariserla, Sandeep; Ariafar, Setareh; Zhang, Shanfeng; Zhang, Shijie; Baumgartner, Simon; Goenka, Sonam; Qiu, Steve; Dabral, Tanmaya; Walker, Trevor; Rao, Vikram; Khawaja, Waleed; Zhou, Wenlei; Ren, Xiaoqi; Xia, Ye; Chen, Yichang; Chen, Yi-Ting; Dong, Zhe; Ding, Zhongli; Visin, Francesco; Liu, Gaël; Zhang, Jiageng; Kenealy, Kathleen; Casbon, Michelle; Kumar, Ravin; Mesnard, Thomas; Gleicher, Zach; Brick, Cormac; Lacombe, Olivier; Roberts, Adam; Sung, Yunhsuan; Hoffmann, Raphael; Warkentin, Tris; Joulin, Armand; Duerig, Tom; Seyedhosseini, Mojtaba

Computer Science > Computation and Language

arXiv:2509.20354v1 (cs)

[Submitted on 24 Sep 2025 (this version), latest version 1 Nov 2025 (v3)]

Title:EmbeddingGemma: Powerful and Lightweight Text Representations

Authors:Henrique Schechter Vera, Sahil Dua, Biao Zhang, Daniel Salz, Ryan Mullins, Sindhu Raghuram Panyam, Sara Smoot, Iftekhar Naim, Joe Zou, Feiyang Chen, Daniel Cer, Alice Lisak, Min Choi, Lucas Gonzalez, Omar Sanseviero, Glenn Cameron, Ian Ballantyne, Kat Black, Kaifeng Chen, Weiyi Wang, Zhe Li, Gus Martins, Jinhyuk Lee, Mark Sherwood, Juyeong Ji, Renjie Wu, Jingxiao Zheng, Jyotinder Singh, Abheesht Sharma, Divya Sreepat, Aashi Jain, Adham Elarabawy, AJ Co, Andreas Doumanoglou, Babak Samari, Ben Hora, Brian Potetz, Dahun Kim, Enrique Alfonseca, Fedor Moiseev, Feng Han, Frank Palma Gomez, Gustavo Hernández Ábrego, Hesen Zhang, Hui Hui, Jay Han, Karan Gill, Ke Chen, Koert Chen, Madhuri Shanbhogue, Michael Boratko, Paul Suganthan, Sai Meher Karthik Duddu, Sandeep Mariserla, Setareh Ariafar, Shanfeng Zhang, Shijie Zhang, Simon Baumgartner, Sonam Goenka, Steve Qiu, Tanmaya Dabral, Trevor Walker, Vikram Rao, Waleed Khawaja, Wenlei Zhou, Xiaoqi Ren, Ye Xia, Yichang Chen, Yi-Ting Chen, Zhe Dong, Zhongli Ding, Francesco Visin, Gaël Liu, Jiageng Zhang, Kathleen Kenealy, Michelle Casbon, Ravin Kumar, Thomas Mesnard, Zach Gleicher, Cormac Brick, Olivier Lacombe, Adam Roberts, Yunhsuan Sung, Raphael Hoffmann, Tris Warkentin, Armand Joulin, Tom Duerig, Mojtaba Seyedhosseini

View PDF

Abstract:We introduce EmbeddingGemma, a new lightweight, open text embedding model based on the Gemma 3 language model family. Our innovative training recipe strategically captures knowledge from larger models via encoder-decoder initialization and geometric embedding distillation. We improve model robustness and expressiveness with a spread-out regularizer, and ensure generalizability by merging checkpoints from varied, optimized mixtures. Evaluated on the Massive Text Embedding Benchmark (MTEB) across multilingual, English, and code domains, EmbeddingGemma (300M) achieves state-of-the-art results. Notably, it outperforms prior top models, both proprietary and open, with fewer than 500M parameters, and provides performance comparable to models double its size, offering an exceptional performance-to-cost ratio. Remarkably, this lead persists when quantizing model weights or truncating embedding outputs. This makes EmbeddingGemma particularly well-suited for low-latency and high-throughput use cases such as on-device applications. We provide ablation studies exploring our key design choices. We release EmbeddingGemma to the community to promote further research.

Comments:	18 pages. Models are available in HuggingFace (at this https URL), Kaggle (at this https URL), and Vertex AI (at this https URL)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2509.20354 [cs.CL]
	(or arXiv:2509.20354v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2509.20354

Submission history

From: Henrique Schechter Vera [view email]
[v1] Wed, 24 Sep 2025 17:56:51 UTC (649 KB)
[v2] Sun, 28 Sep 2025 23:00:34 UTC (649 KB)
[v3] Sat, 1 Nov 2025 23:38:27 UTC (649 KB)

Computer Science > Computation and Language

Title:EmbeddingGemma: Powerful and Lightweight Text Representations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:EmbeddingGemma: Powerful and Lightweight Text Representations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators