0% found this document useful (0 votes)
54 views2 pages

Abstract

The document discusses Vision Transformers (ViTs), a revolutionary technology in computer vision that utilizes self-attention mechanisms to enhance image processing tasks traditionally dominated by CNNs. It covers the foundational principles of ViTs, recent advancements, applications in various domains, and challenges such as high data and computational demands. The overview aims to provide insights into ViTs' transformative capabilities and their future potential in computer vision research.

Uploaded by

aaminasiddiqui82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views2 pages

Abstract

The document discusses Vision Transformers (ViTs), a revolutionary technology in computer vision that utilizes self-attention mechanisms to enhance image processing tasks traditionally dominated by CNNs. It covers the foundational principles of ViTs, recent advancements, applications in various domains, and challenges such as high data and computational demands. The overview aims to provide insights into ViTs' transformative capabilities and their future potential in computer vision research.

Uploaded by

aaminasiddiqui82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Colloquium

COC3800

Vision Transformers:
Principles, Challenges, and
Emerging Trends

Group No.: 2

Group Details

1. Mohammad Faiz Umar (22COB107 / GM1536)


2. Eizad Hamdan (22COB154 / GL5628)
3. Aamina Siddiqui (22COB186 / GL8004)
4. Ayra Riaz Khan (22COB675 / GL4004)
Vision Transformers:
Principles, Challenges, and Emerging Trends

Abstract
Vision Transformers (ViTs) have emerged as a groundbreaking technology in computer vision, revolu-
tionizing how complex vision tasks are addressed by leveraging the self-attention mechanism. Tradition-
ally dominated by convolutional neural networks (CNNs), these tasks have significantly benefited from
ViTs’ ability to divide images into patches and process them sequentially, inheriting the scalability and
adaptability of transformers used in natural language processing. We will explore the foundational prin-
ciples that underpin ViTs, including their unique architecture, the role of the self-attention mechanism,
and the use of positional embeddings to capture spatial relationships in images.
We will review recent advancements in ViT models, focusing on innovations in their design and train-
ing methodologies, including self-supervised learning, hybrid architectures, and hierarchical approaches.
Furthermore, we will examine their applications across diverse domains such as image classification,
object detection, and semantic segmentation, as well as their performance on widely used benchmark
datasets. Despite their success, ViTs face challenges, including high data requirements and compu-
tational demands. We will discuss how these limitations are addressed through techniques such as
locality-enhancing mechanisms, efficient token processing, and integration with CNN-inspired features.
Lastly, we will delve into specific use cases, such as medical imaging and 3D object analysis, to
highlight ViTs’ practical impact and potential for further development. This comprehensive overview
aims to provide a deeper understanding of ViTs, their transformative capabilities, and their role in
shaping the future of computer vision research.

References
1. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16
Words: Transformers for Image Recognition at Scale,” arXiv preprint arXiv:2010.11929, 2020.
[Online].

2. Y. Khan, S. U. Rehman, J. Ahmad, Z. Jan, and A. Khan, “Vision Transformers: State of the Art
and Research Challenges,” arXiv preprint arXiv:2207.03041, 2022. [Online].

3. K. Han, Y. Wang, H. Chen, E. Wang, J. Guo, C. Tang, and Y. Xu, “Recent Advances in Vision
Transformer: A Survey and Outlook of Recent Work,” arXiv preprint arXiv:2111.06079, 2021.
[Online].

You might also like