[PDF][PDF] Transalnet: Visual saliency prediction using transformers
arXiv preprint arXiv:2110.03593, 2021•researchgate.net
Convolutional neural networks (CNNs) have significantly advanced computational modeling
for saliency prediction. However, the inherent inductive biases of convolutional architectures
cause insufficient long-range contextual encoding capacity, which potentially makes a
saliency model less humanlike. Transformers have shown great potential in encoding long-
range information by leveraging the self-attention mechanism. In this paper, we propose a
novel saliency model integrating transformer components to CNNs to capture the long-range …
for saliency prediction. However, the inherent inductive biases of convolutional architectures
cause insufficient long-range contextual encoding capacity, which potentially makes a
saliency model less humanlike. Transformers have shown great potential in encoding long-
range information by leveraging the self-attention mechanism. In this paper, we propose a
novel saliency model integrating transformer components to CNNs to capture the long-range …
Abstract
Convolutional neural networks (CNNs) have significantly advanced computational modeling for saliency prediction. However, the inherent inductive biases of convolutional architectures cause insufficient long-range contextual encoding capacity, which potentially makes a saliency model less humanlike. Transformers have shown great potential in encoding long-range information by leveraging the self-attention mechanism. In this paper, we propose a novel saliency model integrating transformer components to CNNs to capture the long-range contextual information. Experimental results show that the new components make improvements, and the proposed model achieves promising results in predicting saliency.
researchgate.net
Showing the best result for this search. See all results