LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Song, Wenhui; Li, Hanhui; Huang, Jiehui; Hu, Panwen; Cheng, Yuhao; Chen, Long; Yan, Yiqiang; Liang, Xiaodan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2508.07603 (cs)

[Submitted on 11 Aug 2025]

Title:LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Authors:Wenhui Song, Hanhui Li, Jiehui Huang, Panwen Hu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang

View PDF HTML (experimental)

Abstract:In this paper, we present LaVieID, a novel \underline{l}ocal \underline{a}utoregressive \underline{vi}d\underline{e}o diffusion framework designed to tackle the challenging \underline{id}entity-preserving text-to-video task. The key idea of LaVieID is to mitigate the loss of identity information inherent in the stochastic global generation process of diffusion transformers (DiTs) from both spatial and temporal perspectives. Specifically, unlike the global and unstructured modeling of facial latent states in existing DiTs, LaVieID introduces a local router to explicitly represent latent states by weighted combinations of fine-grained local facial structures. This alleviates undesirable feature interference and encourages DiTs to capture distinctive facial characteristics. Furthermore, a temporal autoregressive module is integrated into LaVieID to refine denoised latent tokens before video decoding. This module divides latent tokens temporally into chunks, exploiting their long-range temporal dependencies to predict biases for rectifying tokens, thereby significantly enhancing inter-frame identity consistency. Consequently, LaVieID can generate high-fidelity personalized videos and achieve state-of-the-art performance. Our code and models are available at this https URL.

Comments:	Accepted to ACM MM 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2508.07603 [cs.CV]
	(or arXiv:2508.07603v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2508.07603

Submission history

From: Hanhui Li Dr. [view email]
[v1] Mon, 11 Aug 2025 04:13:32 UTC (28,103 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators