FLatten Transformer: Vision Transformer using Focused Linear Attention

Han, Dongchen; Pan, Xuran; Han, Yizeng; Song, Shiji; Huang, Gao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.00442 (cs)

[Submitted on 1 Aug 2023 (v1), last revised 1 Sep 2023 (this version, v2)]

Title:FLatten Transformer: Vision Transformer using Focused Linear Attention

Authors:Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

View PDF

Abstract:The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear complexity by approximating the Softmax operation through carefully designed mapping functions. However, current linear attention approaches either suffer from significant performance degradation or introduce additional computation overhead from the mapping functions. In this paper, we propose a novel Focused Linear Attention module to achieve both high efficiency and expressiveness. Specifically, we first analyze the factors contributing to the performance degradation of linear attention from two perspectives: the focus ability and feature diversity. To overcome these limitations, we introduce a simple yet effective mapping function and an efficient rank restoration module to enhance the expressiveness of self-attention while maintaining low computation complexity. Extensive experiments show that our linear attention module is applicable to a variety of advanced vision Transformers, and achieves consistently improved performances on multiple benchmarks. Code is available at this https URL.

Comments:	ICCV 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.00442 [cs.CV]
	(or arXiv:2308.00442v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.00442

Submission history

From: Dongchen Han [view email]
[v1] Tue, 1 Aug 2023 10:37:12 UTC (2,472 KB)
[v2] Fri, 1 Sep 2023 08:01:36 UTC (2,474 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:FLatten Transformer: Vision Transformer using Focused Linear Attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:FLatten Transformer: Vision Transformer using Focused Linear Attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators