{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T12:37:40Z","timestamp":1780403860571,"version":"3.54.1"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61832016, 62102162, U20B2070"],"award-info":[{"award-number":["61832016, 62102162, U20B2070"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["413891298"],"award-info":[{"award-number":["413891298"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Natural Science Foundation","award":["L221013"],"award-info":[{"award-number":["L221013"]}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2020AAA0106200"],"award-info":[{"award-number":["2020AAA0106200"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100020595","name":"National Science and Technology Council","doi-asserted-by":"publisher","award":["111-2221-E-006-112-MY3"],"award-info":[{"award-number":["111-2221-E-006-112-MY3"]}],"id":[{"id":"10.13039\/100020595","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,12,5]]},"abstract":"<jats:p>\n            Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called\n            <jats:italic toggle=\"yes\">ProSpect. ProSpect<\/jats:italic>\n            represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and\n            <jats:italic toggle=\"yes\">ProSpect<\/jats:italic>\n            offer better disentanglement and controllability compared to existing methods. We apply\n            <jats:italic toggle=\"yes\">ProSpect<\/jats:italic>\n            in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https:\/\/2.zoppoz.workers.dev:443\/https\/github.com\/zyxElsa\/ProSpect.\n          <\/jats:p>","DOI":"10.1145\/3618342","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T10:20:48Z","timestamp":1701771648000},"page":"1-14","update-policy":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":93,"title":["ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-6433-2678","authenticated-orcid":false,"given":"Yuxin","family":"Zhang","sequence":"first","affiliation":[{"name":"MAIS, Institute of Automation, CAS, China and School of Artificial Intelligence, UCAS, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-6502-145X","authenticated-orcid":false,"given":"Weiming","family":"Dong","sequence":"additional","affiliation":[{"name":"MAIS, Institute of Automation, CAS, China and School of Artificial Intelligence, UCAS, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-3975-2483","authenticated-orcid":false,"given":"Fan","family":"Tang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, CAS, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nisha","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, UCAS, China and MAIS, Institute of Automation, CAS, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-7787-6428","authenticated-orcid":false,"given":"Haibin","family":"Huang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0002-8243-9513","authenticated-orcid":false,"given":"Chongyang","family":"Ma","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-6699-2944","authenticated-orcid":false,"given":"Tong-Yee","family":"Lee","sequence":"additional","affiliation":[{"name":"National Cheng-Kung University, Taiwan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-5803-2185","authenticated-orcid":false,"given":"Oliver","family":"Deussen","sequence":"additional","affiliation":[{"name":"University of Konstanz, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/2.zoppoz.workers.dev:443\/https\/orcid.org\/0000-0001-8343-9665","authenticated-orcid":false,"given":"Changsheng","family":"Xu","sequence":"additional","affiliation":[{"name":"MAIS, Institute of Automation, CAS, China and School of Artificial Intelligence, UCAS, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Art Institute of Chicago. 2023. https:\/\/2.zoppoz.workers.dev:443\/https\/www.artic.edu\/ Last accessed on 2023-09-12."},{"key":"e_1_2_2_2_1","volume-title":"Blended Diffusion for Text-Driven Editing of Natural Images. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208--18218","author":"Avrahami Omri","year":"2022","unstructured":"Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208--18218."},{"key":"e_1_2_2_3_1","volume-title":"eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324","author":"Balaji Yogesh","year":"2022","unstructured":"Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, and Ming-Yu Liu. 2022. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324 (2022)."},{"key":"e_1_2_2_4_1","volume-title":"Paint by word. arXiv preprint arXiv:2103.10951","author":"Bau David","year":"2021","unstructured":"David Bau, Alex Andonian, Audrey Cui, YeonHwan Park, Ali Jahanian, Aude Oliva, and Antonio Torralba. 2021. Paint by word. arXiv preprint arXiv:2103.10951 (2021)."},{"key":"e_1_2_2_5_1","volume-title":"Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations (ICLR).","author":"Brock Andrew","year":"2019","unstructured":"Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01764"},{"key":"e_1_2_2_7_1","volume-title":"International Conference on Machine Learning (ICML).","author":"Chang Huiwen","year":"2023","unstructured":"Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, and Dilip Krishnan. 2023. Muse: Text-To-Image Generation via Masked Generative Transformers. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_2_8_1","volume-title":"JoJoGAN: One Shot Face Stylization. In European Conference on Computer Vision (ECCV)","author":"Chong Min Jin","year":"2022","unstructured":"Min Jin Chong and David Forsyth. 2022. JoJoGAN: One Shot Face Stylization. In European Conference on Computer Vision (ECCV) (Tel Aviv, Israel). Springer-Verlag, Berlin, Heidelberg, 128--152."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19836-6_6"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01104"},{"key":"e_1_2_2_11_1","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems (NeurIPS. 8780--8794."},{"key":"e_1_2_2_12_1","volume-title":"Taming Transformers for High-Resolution Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12873--12883","author":"Esser Patrick","year":"2021","unstructured":"Patrick Esser, Robin Rombach, and Bjorn Ommer. 2021. Taming Transformers for High-Resolution Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12873--12883."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_6"},{"key":"e_1_2_2_14_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Gal Rinon","year":"2023","unstructured":"Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2023a. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592133"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530164"},{"key":"e_1_2_2_17_1","volume-title":"Advances in Neural Information Processing Systems (NIPS). Curran Associates","author":"Goodfellow Ian","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS). Curran Associates, Inc."},{"key":"e_1_2_2_18_1","volume-title":"Prompt-to-Prompt Image Editing with Cross Attention Control. In International Conference on Learning Representations (ICLR).","author":"Hertz Amir","year":"2023","unstructured":"Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Prompt-to-Prompt Image Editing with Cross Attention Control. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_19_1","unstructured":"Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_2_2_20_1","volume-title":"Composer: Creative and Controllable Image Synthesis with Composable Conditions. In International Conference on Machine Learning (ICML).","author":"Huang Lianghua","year":"2023","unstructured":"Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. 2023a. Composer: Creative and Controllable Image Synthesis with Composable Conditions. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_2_21_1","volume-title":"Region-Aware Diffusion for Zero-shot Text-driven Image Editing. arXiv preprint arXiv:2302.11797","author":"Huang Nisha","year":"2023","unstructured":"Nisha Huang, Fan Tang, Weiming Dong, Tong-Yee Lee, and Changsheng Xu. 2023b. Region-Aware Diffusion for Zero-shot Text-driven Image Editing. arXiv preprint arXiv:2302.11797 (2023)."},{"key":"e_1_2_2_22_1","volume-title":"Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion. In ACM International Conference on Multimedia","author":"Huang Nisha","year":"2022","unstructured":"Nisha Huang, Fan Tang, Weiming Dong, and Changsheng Xu. 2022a. Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion. In ACM International Conference on Multimedia (Lisboa, Portugal). 1085--1094."},{"key":"e_1_2_2_23_1","volume-title":"Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer. arXiv preprint arXiv:2305.05464","author":"Huang Nisha","year":"2023","unstructured":"Nisha Huang, Yuxin Zhang, and Weiming Dong. 2023d. Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer. arXiv preprint arXiv:2305.05464 (2023)."},{"key":"e_1_2_2_24_1","volume-title":"DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization. arXiv preprint arXiv:2211.10682","author":"Huang Nisha","year":"2022","unstructured":"Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Yong Zhang, Weiming Dong, and Changsheng Xu. 2022b. DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization. arXiv preprint arXiv:2211.10682 (2022)."},{"key":"e_1_2_2_25_1","volume-title":"Multimodal Unsupervised Image-to-Image Translation. In European Conference on Computer Vision (ECCV). 172--189","author":"Huang Xun","year":"2018","unstructured":"Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. In European Conference on Computer Vision (ECCV). 172--189."},{"key":"e_1_2_2_26_1","volume-title":"Kelvin CK Chan, and Ziwei Liu","author":"Huang Ziqi","year":"2023","unstructured":"Ziqi Huang, Tianxing Wu, Yuming Jiang, Kelvin CK Chan, and Ziwei Liu. 2023c. ReVersion: Diffusion-Based Relation Inversion from Images. arXiv preprint arXiv:2303.13495 (2023)."},{"key":"e_1_2_2_27_1","volume-title":"Training-free Style Transfer Emerges from h-space in Diffusion models. arXiv preprint arXiv:2303.15403","author":"Jeong Jaeseok","year":"2023","unstructured":"Jaeseok Jeong, Mingi Kwon, and Youngjung Uh. 2023. Training-free Style Transfer Emerges from h-space in Diffusion models. arXiv preprint arXiv:2303.15403 (2023)."},{"key":"e_1_2_2_28_1","unstructured":"Tero Karras Miika Aittala Janne Hellsten Samuli Laine Jaakko Lehtinen and Timo Aila. 2020. Training Generative Adversarial Networks with Limited Data. In Advances in Neural Information Processing Systems (NeurIPS). 12104--12114."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_2_2_30_1","volume-title":"Imagic: Text-Based Real Image Editing with Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6007--6017","author":"Kawar Bahjat","year":"2023","unstructured":"Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-Based Real Image Editing with Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6007--6017."},{"key":"e_1_2_2_31_1","volume-title":"Multi-Concept Customization of Text-to-Image Diffusion. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Kumari Nupur","year":"2023","unstructured":"Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2023a. Multi-Concept Customization of Text-to-Image Diffusion. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_32_1","volume-title":"Multi-Concept Customization of Text-to-Image Diffusion. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1931--1941","author":"Kumari Nupur","year":"2023","unstructured":"Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2023b. Multi-Concept Customization of Text-to-Image Diffusion. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1931--1941."},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01753"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01284-z"},{"key":"e_1_2_2_35_1","volume-title":"Qibin Hou, Yaxing Wang, and Jian Yang.","author":"Li Senmao","year":"2023","unstructured":"Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, and Jian Yang. 2023. StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing. arXiv preprint arXiv:2303.15649 (2023)."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01765"},{"key":"e_1_2_2_37_1","volume-title":"RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11461--11471","author":"Lugmayr Andreas","year":"2022","unstructured":"Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. 2022. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11461--11471."},{"key":"e_1_2_2_38_1","volume-title":"Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073","author":"Meng Chenlin","year":"2021","unstructured":"Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)."},{"key":"e_1_2_2_39_1","unstructured":"Kevin Meng David Bau Alex Andonian and Yonatan Belinkov. 2022. Locating and editing factual associations in GPT. In Advances in Neural Information Processing Systems (NeurIPS). 17359--17372."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00585"},{"key":"e_1_2_2_41_1","volume-title":"International Conference on Machine Learning (ICML).","author":"Nichol Alex","year":"2022","unstructured":"Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML)."},{"key":"e_1_2_2_42_1","volume-title":"International Conference on Machine Learning (ICML). 8162--8171","author":"Nichol Alexander Quinn","year":"2021","unstructured":"Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (ICML). 8162--8171."},{"key":"e_1_2_2_43_1","first-page":"7198","article-title":"Swapping autoencoder for deep image manipulation","volume":"33","author":"Park Taesung","year":"2020","unstructured":"Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei Efros, and Richard Zhang. 2020. Swapping autoencoder for deep image manipulation. Advances in Neural Information Processing Systems 33 (2020), 7198--7211.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_44_1","volume-title":"StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In IEEE\/CVF International Conference on Computer Vision (ICCV). 2085--2094","author":"Patashnik Or","year":"2021","unstructured":"Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. In IEEE\/CVF International Conference on Computer Vision (ICCV). 2085--2094."},{"key":"e_1_2_2_45_1","unstructured":"Pexels. 2023. https:\/\/2.zoppoz.workers.dev:443\/https\/www.pexels.com Last accessed on 2023-09-12."},{"key":"e_1_2_2_46_1","volume-title":"International Conference on Machine Learning (ICML). 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML). 8748--8763."},{"key":"e_1_2_2_47_1","volume-title":"Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125 (2022)."},{"key":"e_1_2_2_48_1","volume-title":"International Conference on Machine Learning (ICML). PMLR, 8821--8831","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning (ICML). PMLR, 8821--8831."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_2_2_51_1","volume-title":"Burcu Karagol Ayan, Tim Salimans, et al.","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS). 36479--36494."},{"key":"e_1_2_2_52_1","volume-title":"StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation. In International Joint Conference on Artificial Intelligence (IJCAI). 4966--4972","author":"Schaldenbrand Peter","year":"2022","unstructured":"Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. 2022. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation. In International Joint Conference on Artificial Intelligence (IJCAI). 4966--4972."},{"key":"e_1_2_2_53_1","volume-title":"FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6490--6499","author":"Singh Krishna Kumar","year":"2019","unstructured":"Krishna Kumar Singh, Utkarsh Ojha, and Yong Jae Lee. 2019. FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6490--6499."},{"key":"e_1_2_2_54_1","volume-title":"DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16494--16504","author":"Tao Ming","year":"2022","unstructured":"Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bing-Kun Bao, and Changsheng Xu. 2022. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16494--16504."},{"key":"e_1_2_2_55_1","volume-title":"Key-Locked Rank One Editing for Text-to-Image Personalization. In ACM SIGGRAPH 2023 Conference Proceedings","author":"Tewel Yoad","year":"2023","unstructured":"Yoad Tewel, Rinon Gal, Gal Chechik, and Yuval Atzmon. 2023. Key-Locked Rank One Editing for Text-to-Image Personalization. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH '23). Association for Computing Machinery, New York, NY, USA, Article 12, 11 pages."},{"key":"e_1_2_2_56_1","unstructured":"The Barnes Foundation. 2023. https:\/\/2.zoppoz.workers.dev:443\/https\/www.barnesfoundation.org\/ Last accessed on 2023-09-12."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592451"},{"key":"e_1_2_2_58_1","volume-title":"Extended Textual Conditioning in Text-to-Image Generation. arXiv preprint arXiv:2303.09522","author":"Voynov Andrey","year":"2023","unstructured":"Andrey Voynov, Qinghao Chu, Daniel Cohen-Or, and Kfir Aberman. 2023. P+: Extended Textual Conditioning in Text-to-Image Generation. arXiv preprint arXiv:2303.09522 (2023)."},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41095-022-0284-6"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41095-022-0294-4"},{"key":"e_1_2_2_61_1","volume-title":"Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668","author":"Wen Yuxin","year":"2023","unstructured":"Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2023. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. arXiv preprint arXiv:2302.03668 (2023)."},{"key":"e_1_2_2_62_1","volume-title":"Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1900--1910","author":"Wu Qiucheng","year":"2023","unstructured":"Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, and Shiyu Chang. 2023. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1900--1910."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01763"},{"key":"e_1_2_2_65_1","volume-title":"Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer. arXiv preprint arXiv:2303.08622","author":"Yang Serin","year":"2023","unstructured":"Serin Yang, Hyunmin Hwang, and Jong Chul Ye. 2023b. Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer. arXiv preprint arXiv:2303.08622 (2023)."},{"key":"e_1_2_2_66_1","volume-title":"Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu.","author":"Yu Jiahui","year":"2023","unstructured":"Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, and Yonghui Wu. 2023. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. Transactions on Machine Learning Research (2023)."},{"key":"e_1_2_2_67_1","volume-title":"Jason Baldridge, Honglak Lee, and Yinfei Yang.","author":"Zhang Han","year":"2021","unstructured":"Han Zhang, Jing Yu Koh, Jason Baldridge, Honglak Lee, and Yinfei Yang. 2021. Cross-Modal Contrastive Learning for Text-to-Image Generation. In IEEE\/CVFConference on Computer Vision and Pattern Recognition (CVPR). 833--842."},{"key":"e_1_2_2_68_1","volume-title":"Inversion-Based Style Transfer with Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10146--10156","author":"Zhang Yuxin","year":"2023","unstructured":"Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. 2023b. Inversion-Based Style Transfer with Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10146--10156."},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530736"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3605548"},{"key":"e_1_2_2_71_1","volume-title":"SINE: SINgle Image Editing with Text-to-Image Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6027--6037","author":"Zhang Zhixing","year":"2023","unstructured":"Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, and Jian Ren. 2023a. SINE: SINgle Image Editing with Text-to-Image Diffusion Models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6027--6037."},{"key":"e_1_2_2_72_1","volume-title":"DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5802--5810","author":"Zhu Minfeng","year":"2019","unstructured":"Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5802--5810."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3618342","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/pdf\/10.1145\/3618342","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T10:52:57Z","timestamp":1755773577000},"score":1,"resource":{"primary":{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/dl.acm.org\/doi\/10.1145\/3618342"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":72,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12,5]]}},"alternative-id":["10.1145\/3618342"],"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1145\/3618342","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"2023-12-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}