PVG: Progressive Vision Graph for Vision Recognition

Wu, Jiafu; Li, Jian; Zhang, Jiangning; Zhang, Boshen; Chi, Mingmin; Wang, Yabiao; Wang, Chengjie

doi:10.1145/3581783.3612122

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.00574 (cs)

[Submitted on 1 Aug 2023 (v1), last revised 10 Dec 2024 (this version, v2)]

Title:PVG: Progressive Vision Graph for Vision Recognition

Authors:Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao Wang, Chengjie Wang

View PDF HTML (experimental)

Abstract:Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and mathematical Expectation (MaxE) to aggregate rich neighbor information; 3) Graph error Linear Unit (GraphLU) to enhance low-value information in a relaxed form to reduce the compression of image detail information for alleviating the over-smoothing. Extensive experiments on mainstream benchmarks demonstrate the superiority of PVG over state-of-the-art methods, e.g., our PVG-S obtains 83.0% Top-1 accuracy on ImageNet-1K that surpasses GNN-based ViG-S by +0.9 with the parameters reduced by 18.5%, while the largest PVG-B obtains 84.2% that has +0.5 improvement than ViG-B. Furthermore, our PVG-S obtains +1.3 box AP and +0.4 mask AP gains than ViG-S on COCO dataset.

Comments:	Accepted by ACM MM 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.00574 [cs.CV]
	(or arXiv:2308.00574v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.00574
Journal reference:	ACM International Conference on Multimedia 2023
Related DOI:	https://doi.org/10.1145/3581783.3612122

Submission history

From: Jiafu Wu [view email]
[v1] Tue, 1 Aug 2023 14:35:29 UTC (2,713 KB)
[v2] Tue, 10 Dec 2024 07:33:14 UTC (2,714 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PVG: Progressive Vision Graph for Vision Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PVG: Progressive Vision Graph for Vision Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators