CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Li, Hanting; Niu, Hongjing; Zhu, Zhaoqing; Zhao, Feng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.00193 (cs)

[Submitted on 1 Mar 2023]

Title:CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Authors:Hanting Li, Hongjing Niu, Zhaoqing Zhu, Feng Zhao

View PDF

Abstract:Facial expression recognition (FER) is an essential task for understanding human behaviors. As one of the most informative behaviors of humans, facial expressions are often compound and variable, which is manifested by the fact that different people may express the same expression in very different ways. However, most FER methods still use one-hot or soft labels as the supervision, which lack sufficient semantic descriptions of facial expressions and are less interpretable. Recently, contrastive vision-language pre-training (VLP) models (e.g., CLIP) use text as supervision and have injected new vitality into various computer vision tasks, benefiting from the rich semantics in text. Therefore, in this work, we propose CLIPER, a unified framework for both static and dynamic facial Expression Recognition based on CLIP. Besides, we introduce multiple expression text descriptors (METD) to learn fine-grained expression representations that make CLIPER more interpretable. We conduct extensive experiments on several popular FER benchmarks and achieve state-of-the-art performance, which demonstrates the effectiveness of CLIPER.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2303.00193 [cs.CV]
	(or arXiv:2303.00193v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.00193

Submission history

From: Hanting Li [view email]
[v1] Wed, 1 Mar 2023 02:59:55 UTC (856 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators