0% found this document useful (0 votes)
8 views13 pages

Transformer Architectures_ResearchPaper (1)

Uploaded by

jamthedamnit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Transformer Architectures_ResearchPaper (1)

Uploaded by

jamthedamnit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Transformer Architectures

Simona Rumao Jonathan Dabre Madhav Jha


Department of Computer Engineering Department of Computer Engineering Department of Computer Engineering
Fr. Conceicao Rodrigues College of Fr. Conceicao Rodrigues College of Fr. Conceicao Rodrigues College of
Engineering Engineering Engineering

Mumbai, India Mumbai, India Mumbai, India


[email protected] [email protected] [email protected]

Raj Saner Prathamesh Doifode


Department of Computer Engineering Department of Computer Engineering
Fr. Conceicao Rodrigues College of Fr. Conceicao Rodrigues College of
Engineering Engineering
Mumbai, India Mumbai, India
[email protected] [email protected]

Contextual Understanding, Long-Range


Abstract— Transformer architectures have
Dependencies
revolutionized the fields of natural language
- Machine Learning
processing (NLP), computer vision, medical
- Model Scalability
imaging, and bioinformatics by addressing
- Interpretability
long-range dependencies, scalability, and contextual
- Domain-Specific Applications
understanding challenges. This collection of studies
explores enhancements in transformer models,
including efficient attention mechanisms, adaptive INTRODUCTION
architectures, and hybrid approaches. Applications The rapid evolution of artificial intelligence and
span from fine-grained classification and optical machine learning has been significantly influenced by
flow forecasting to graph data analysis, medical transformer architectures, which have demonstrated
image segmentation, and genome data analysis. unparalleled success in addressing complex challenges
Innovations such as hierarchical attention, adaptive across diverse domains. Initially introduced in the
attention, and localized focus reduce computational seminal work "Attention Is All You Need,"
complexity while improving accuracy and transformers replaced traditional recurrent and
interpretability. Challenges like high computational convolutional neural networks by leveraging
demands, memory constraints, and training biases self-attention mechanisms to model global
persist, prompting future research into more dependencies efficiently. This paradigm shift has
efficient, domain-specific, and universally enabled advancements in natural language processing
adaptable models. These advancements underline (NLP), computer vision, bioinformatics, and time
transformers' transformative role in addressing series forecasting, redefining benchmarks and
complex tasks across various scientific and expanding the applicability of machine learning
practical domains. models.

Keywords - Transformer Architectures, The growing demand for scalable and adaptable
Self-Attention Mechanism, Efficient Attention models has driven researchers to refine transformers
Mechanisms, Natural Language Processing (NLP), further. These advancements include the development
Vision Transformers (ViTs), Fine-Grained of efficient attention mechanisms to mitigate the
Classification , Medical Image Segmentation, quadratic complexity of traditional transformers, novel
Genome Data Analysis, Optical Flow Forecasting, positional encoding strategies to enhance sequence
Graph Neural Networks (GNNs), Adaptive comprehension, and adaptive architectures tailored for
Attention, Positional Encoding, Hybrid specific tasks. Key innovations have also emerged in
Architectures, Computational Efficiency, hybrid models, such as integrating transformers with
convolutional neural networks (CNNs) or graph neural MOTIVATION
networks (GNNs), combining local feature extraction The motivation behind this research lies in the
with global contextual modeling. transformative impact of transformer architectures
across a wide range of domains and the potential they
In NLP, transformer-based models like BERT, GPT, hold to address some of the most pressing challenges
and RoBERTa have achieved state-of-the-art in machine learning and artificial intelligence. As
performance in tasks such as text classification, data grows in complexity and scale, traditional
paraphrase generation, and politeness prediction. methods such as recurrent neural networks (RNNs)
Meanwhile, vision transformers (ViTs) have and convolutional neural networks (CNNs) struggle
revolutionized computer vision by excelling in to capture long-range dependencies, model global
fine-grained classification, optical flow estimation, and contexts, and efficiently process high-dimensional
medical image segmentation. In bioinformatics, data. Transformers, with their self-attention
transformers have proven instrumental in genome data mechanisms and scalability, have emerged as a
analysis, enabling breakthroughs in sequence groundbreaking solution, redefining state-of-the-art
prediction and gene expression modeling. performance across tasks in natural language
Furthermore, applications in time series forecasting processing (NLP), computer vision, bioinformatics,
and graph data analysis underscore transformers' and more.
versatility in handling complex, structured datasets.
Despite their success, transformers are not without
Despite their remarkable achievements, transformer limitations. High computational costs, memory
architectures face several challenges. High constraints, sensitivity to data quality, and difficulties
computational and memory requirements, sensitivity to in handling long-context scenarios create bottlenecks
data quality, and difficulty in managing long-context for broader adoption. These challenges motivate
scenarios limit their widespread adoption. Addressing researchers to innovate and refine transformer
these issues has led to innovative solutions, including architectures, exploring efficient attention
attention mechanisms optimized for local mechanisms, adaptive and hybrid models, and
dependencies, dynamic feature selection methods, and domain-specific adaptations. Moreover, the universal
evolutionary architecture searches. These applicability of transformers, from paraphrase
advancements ensure transformers remain at the generation in low-resource languages to genome
forefront of machine learning research, capable of sequence prediction and time series forecasting,
tackling increasingly sophisticated problems. underscores the need to continuously expand their
capabilities while addressing their inherent
constraints.
This paper aims to synthesize the latest advancements
in transformer architectures across various domains. It
explores the core methodologies, identifies key This study is driven by the need to bridge gaps
challenges, and highlights transformative applications, between current transformer models and the evolving
providing a comprehensive overview of the current demands of real-world applications. By synthesizing
state of the art. By examining these innovations, this advancements across diverse fields, this research
study not only underscores the transformative impact seeks to identify best practices, common challenges,
of transformers but also identifies future research and opportunities for future innovation. The ultimate
directions to overcome existing limitations and goal is to contribute to the development of more
broaden their applicability. efficient, interpretable, and adaptable transformer
models that can unlock new possibilities in machine
learning, pushing the boundaries of what is
achievable in artificial intelligence.\
REVIEW OF LITERATURE

Sr. Authors Title Gaps Suggestions


No.

1 Changli Cai
Transformer-based LLMs face The study suggests
Advancing Transformer
Tiankui Zhang drawbacks like high improving LLMs with
Architecture in
Zhewei Weng Long-Context Large computational costs, limited efficient attention, better
Language Models memory capacity, fixed input memory models, scalable
Chunyan Feng length constraints, encoding, and optimized
Yapeng Wang efficiency-performance context handling.
trade-offs, and challenges in
scaling for long-context
scenarios.

2 Jihad R’Bait Limited exploration of diverse Experiment with newer


Rdouan Faizi A Transformer-Based transformer architectures for transformer models like
Architecture for the clickbait detection in Arabic. AraELECTRA for
Youssef
Automatic Detection potential performance
Hmamouche
of Clickbait for Arabic improvements.
Amal El Fallah Headlines​
Seghrouchni

3 YonPengH Limited evaluation of TransAA Test TransAA on broader,


Jingwiexu A Transformer on diverse real-world FGVC domain-specific datasets
Architecture with scenarios. to assess its
Junyu lai Adaptive Attention for generalizability.
Zixu jiang Fine-Grained Visual
Classification​
Limited Extend the Transformer
4 Ashish
Attention is all you need. exploration of architecture to handle
Vaswani the mdomain-specific AI
Noam Transformer's applications, Ethical and
Shazeer adaptability to deployment challenges,
non-text Interoperability, and feasibility
Niki modalities like issues.ultimodal inputs for
Parmar audio or video. broader applications.

S.No Authors Title Gaps Suggestions

5 Kleesha P Limited exploration Optimize FlowFormer for


Evolution of Transformer
Lavanya U of FlowFormer’s real-time processing to
Architectures in NLP
Namratha U performance on enhance its practical utility in
Meenakshi S J real-time optical flow dynamic environments.
applications.

Limited evaluation
6 Zhaoyang of Local Attention Test Local Attention across
FlowFormer: A
Huang,Xiaoyu on diverse various industries for broader
Transformer Architecture
Shi,Chao real-world time validation.
for Optical Flow
Zhang,Qiang series datasets.
Wang

7 Ignacio TNT lacks exploration Extend TNT to multi-modal


Local Attention:
Aguilera-Ma in multi-modal tasks applications for enhanced
Enhancing Transformer
rtos, Andres combining vision and adaptability.
Architecture for Efficient
Herrera-Poya text.
Time Series Forecasting
tos, Julian
Luengo,Fran
cisco Herrera

8 Kai Han,An ENAS-KT’s Assess ENAS-KT on diverse


Transformer in
Xiao,Enhua performance in dataset scales to verify
Transformer
Wu,Jianyuan smaller KT datasets
Guo,Chunjin remains unexplored.
g Xu,Yunhe generalizability.
Wang
S.No Authors Title Gaps Suggestions

9 Shangshang Pre-LN's effect on Investigate Pre-LN behavior


Evolutionary Neural extremely large-scale in multi-billion parameter
Yang,Xiaosh
YeTianXuemi Architecture Search for models is underexplored. Transformer models.
ngYan,Haipin Transformer in
g Ma Knowledge Tracing

Transformers struggle Develop new attention


10 RuibinXiong with complex mechanisms to improve
On Layer Normalization
,Yunchang compositional tasks. compositional reasoning.
in the Transformer
Yang,Di
Architecture
HeKai
Zheng
Shuxin
Zheng,Chen
Xing,Huishu
ai Zhang
Transformers for NLP Focus on lightweight,
11 Binghui On Limitations of the lack compact, adaptable models with
Peng, Srini Transformer Architecture universally adaptive improved cross-domain
Narayanan†a systems. applicability.
nd Christos
Papadimitrio
u‡

Limited datasets for Expand datasets and


12 Jacky Casas, Overview of the paraphrase generation in explore multilingual
Elena Transformer-based low-resource languages. transformer models for
Mugellini, Models for NLP Tasks paraphrasing.considerations
Omar Abou , and user acceptance.
Khaled
applicability.

Politeness prediction
13 Mosima Anna Paraphrase Generation lacks multilingual and Extend the model to
Masethe2,3, Model Using context-specific datasets. multilingual datasets and
Hlaudi Daniel TransformerBased incorporate user context.
Masethe1
Architecture
Sunday
Olusegun
Ojo3, and
Pius A
Owolawi1

TransFG’s utility in Expand TransFG for


14 Shakir Khan Transformer multi-modal tasks is multi-modal
1,2,* , Mohd Architecture-Based unexplored. classification to test
Fazil 3, Transfer Learning for
Agbotiname versatility.
Politeness Prediction in
Lucky Imoize
4, Bayan Conversation
Ibrahimm
Alabduallah
5,*,Bader M.
Albahlal 1,
Saad
Abdullah
Alajlan 1,
Abrar
Almjally 1
and Tamanna
Siddiqui 6
Transformer-based
15 Ju He1 TransFG: A genome analysis Address data scarcity with
, Jie-Neng Transformer depends on high-quality augmentation techniques and
Chen1, Shuai Architecture for semi-supervised learning.
datasets.
Liu3
Fine-Grained
,Adam
Kortylewski2, Recognition
Cheng Yang3,
Yutong Bai1,
Changhu
Wang3

TNT lacks thorough


16 Sanghyuk Transformer Architecture evaluation on Optimize TNT for
Roy Choi and Attention Mechanisms computational efficiency large-scale deployments to
and in at scale. validate real-world
performance.
Minhyeok Genome Data Analysis: A
Lee ∗ Comprehensive Review

17 Kai Han1,2 Transformer in TNT lacks evaluation in Extend TNT to video and
An Xiao2 Transformer tasks beyond image multi-modal tasks to explore
Enhua recognition, such as its adaptability and broader
Wu1,3∗ video analysis or applicability.
Jianyuan multi-modal learning.
Guo2
Chunjing Xu2
Yunhe
Wang2∗
18 Erxue Min1∗, Transformer for Graphs: Graph Transformers Develop more scalable
Runfa Chen3, An Overview from face scalability issues attention mechanisms for
Yatao Bian2 Architecture Perspective with very large graphs. processing large graphs.
, Tingyang Xu2
, Kangfei
Zhao2,
Wenbing
Huang4, Peilin
Zhao2, Junzhou
Huang5,
Sophia
Ananiadou1,
Yu Rong2†
19 Yunhe Gao1 UTNet: A Hybrid UTNet's performance Test UTNet on broader
Transformer Architecture on diverse medical imaging tasks to validate
, Mu Zhou1,2, cross-modal robustness.
and Dimitris for Medical Image imaging modalities is
Metaxas1 Segmentation underexplored.

20 Mohammed Vision Transformers: A ViTs face challenges Incorporate explainability


Lahraichi, Review with interpretability techniques to improve
Khalid Architecture,Applications, in complex tasks. transparency and usability.
Housni, and Future Directions
Abdelhafid
Berroukhum
PROPOSED METHODOLOGY Evaluation Metrics and Benchmarks

This research synthesizes advancements in transformer ● Evaluate models on standard datasets using
architectures and presents a comprehensive framework to metrics such as BLEU, ROUGE, AUC, and ACC,
address their limitations and enhance their applicability depending on the task.
across diverse domains. ● Benchmark the proposed methods against
state-of-the-art models across tasks like
Problem Identification and Scope Definition paraphrase generation, clickbait detection, and
fine-grained classification.
● Analyze the limitations of existing transformer
architectures, such as computational inefficiency, Application-Specific Enhancements
memory constraints, and challenges in handling
long-range dependencies. ● For NLP: Introduce transformer-based transfer
● Define the scope of transformer applications learning for multilingual and low-resource
across NLP, computer vision, bioinformatics, and language tasks.
time-series forecasting. ● For Computer Vision: Implement part selection
modules and overlapping patch processing for
Architectural Innovation fine-grained classification.
● For Bioinformatics: Develop transformer models
● Efficient Attention Mechanisms: Explore optimized for genome data analysis, incorporating
lightweight and scalable attention mechanisms, multi-head attention and positional encoding.
such as local attention and hierarchical attention, ● For Time-Series Forecasting: Use tensor-based
to reduce computational complexity. attention mechanisms to handle large-scale
● Hybrid Models: Integrate transformers with other datasets efficiently.
architectures like CNNs and GNNs for tasks
requiring local-global feature extraction, such as Addressing Limitations
medical image segmentation and graph-based
tasks. ● Mitigate issues like over-smoothing in
● Adaptive Modules: Incorporate adaptive attention graph-based transformers and hallucinations in
modules to balance focus on critical and language models by refining attention matrices
non-critical features dynamically, improving and integrating external memory mechanisms.
robustness and interpretability. ● Reduce computational overhead by optimizing
transformer layers and leveraging hardware
Dataset Preparation and Preprocessing accelerators.

● Curate diverse and high-quality datasets, ensuring Future Directions and Scalability
coverage of multilingual, multimodal, and
domain-specific applications. ● Extend transformer applications to multi-modal
● Perform preprocessing tasks like normalization, datasets, integrating text, image, and graph data
augmentation, and embedding generation to for richer contextual understanding.
enhance input quality for transformer models. ● Explore unsupervised and semi-supervised
pre-training strategies to handle data scarcity in
Model Training and Optimization specific domains.
● Develop universally adaptive transformer
● Employ advanced training techniques, including architectures capable of efficient generalization
pre-training on large datasets followed by across tasks with minimal fine-tuning
fine-tuning for domain-specific tasks.
● Utilize efficient normalization strategies, such as
Pre-Layer Normalization, to stabilize training and
reduce convergence time.
RESULTS AND DISCUSSIONS Efficient attention mechanisms such as local attention and
adaptive attention modules have reduced computational
complexity, enabling scalability to long-context scenarios.
The collective insights from the 20 research papers Hybrid architectures (e.g., UTNet) combining transformers
provide a broad understanding of the transformative with CNNs or GNNs have demonstrated superior
impact of transformer architectures across various performance in tasks like medical image segmentation and
domains. By synthesizing the findings, several key results graph-based analyses.
and implications emerge:
Discussion: These innovations address key limitations of
traditional transformer designs. However, the trade-off
1. Performance Improvements Across Domains between efficiency and accuracy needs careful
consideration, particularly in resource-constrained
Natural Language Processing (NLP): Transformer-based
environments.
models like BERT and GPT have set benchmarks for tasks
like machine translation, sentiment analysis, and 3. Task-Specific Enhancements
paraphrase generation. For example, BERT demonstrated For fine-grained visual tasks, models like TransFG have
enhanced contextual understanding, achieving introduced part selection modules to focus on
state-of-the-art results in question-answering and language discriminative regions, achieving state-of-the-art results on
inference tasks. Fine-tuning these models for specific datasets like Stanford Dogs. In time-series forecasting,
languages (e.g., Arabic headlines) showcased their local attention mechanisms have excelled in reducing
adaptability. memory usage while maintaining predictive accuracy.
Discussion: These task-specific improvements validate the
Discussion: Despite these successes, challenges such as adaptability of transformers but highlight the need for
handling low-resource languages and managing biases in further exploration into general-purpose architectures that
training data persist. Future work must focus on increasing can handle diverse tasks without extensive
the diversity of datasets and improving multilingual reconfiguration.
adaptability. 4. Challenges and Limitations
Computer Vision: Vision transformers (ViTs) and models Theoretical analyses have revealed fundamental
like TransFG and TNT have outperformed CNNs in constraints in transformers, such as difficulties in function
fine-grained classification and visual recognition tasks, composition and deep compositional reasoning. These
achieving remarkable accuracy on benchmark datasets like issues limit their ability to generalize to tasks requiring
ImageNet and CUB-200-2011. The inclusion of adaptive high-level abstraction.
modules and hierarchical attention has further enhanced
their capabilities. Discussion: Addressing these challenges requires
rethinking core architectural components, such as attention
mechanisms and positional encodings, to better handle
Discussion: While ViTs excel in global context modeling, complex reasoning tasks.
their computational demands remain high. Optimization 5. Future Implications
strategies such as patch embedding and lightweight
transformer designs are crucial for broader adoption. The results demonstrate that transformers are not only
advancing performance across domains but also inspiring
Bioinformatics: Transformers have been successfully innovative methodologies for addressing long-standing
applied to genome sequence analysis, CRISPR prediction, challenges. However, the computational cost, data
and multi-omics integration. Attention mechanisms have dependency, and interpretability concerns remain open
allowed dynamic prioritization of genomic features, areas of research. Expanding transformers' applicability to
significantly improving prediction accuracy. low-resource and real-time tasks will be pivotal in
unlocking their full potential.
Discussion: The reliance on high-quality data and the
computational intensity of these models pose significant
challenges, necessitating the development of efficient
training workflows and noise-resilient architectures.
2. Advances in Architectural Design
CONCLUSION & FUTURE WORK

This paper presents a comprehensive review of the


advancements in transformer architectures and their
applications across diverse domains, including natural
language processing (NLP), computer vision,
bioinformatics, and time-series forecasting. Through the
synthesis of findings from 20 significant research papers,
we have identified key innovations, challenges, and
opportunities for further development.

Transformers have proven to be a powerful solution for


handling long-range dependencies and contextual
relationships, enabling significant improvements in task
performance. Notably, models such as BERT, GPT, ViTs,
and hybrid architectures combining transformers with
CNNs and GNNs have set new benchmarks in tasks
ranging from text classification and paraphrase generation
to image recognition and medical image segmentation.
The efficiency of attention mechanisms, including local
and adaptive attention, has helped address the
computational challenges that typically hinder the
scalability of transformers.

However, despite their remarkable success, transformers


still face critical limitations. The need for high-quality,
large-scale datasets, the computational demands of large
transformer models, and the difficulties in handling
complex reasoning tasks remain key obstacles.
Furthermore, issues such as data bias, lack of multilingual
and cross-domain capabilities, and the interpretability of
models must be addressed to ensure broader and more
equitable adoption across industries.Future research should
focus on refining transformer architectures to improve
their efficiency, enhance their generalization capabilities,
and expand their applicability to underrepresented
domains and low-resource languages. Exploring hybrid
models, optimizing attention mechanisms, and developing
more interpretable models will be essential steps toward
overcoming the challenges identified in this review.

In conclusion, transformers have revolutionized the


landscape of machine learning, and the ongoing efforts to
optimize and extend their capabilities will continue to
drive significant breakthroughs in AI across various fields.
By addressing current limitations, transformers hold the
potential to unlock new possibilities and redefine how we
approach complex problem-solving tasks in diverse
applications.
REFERENCES
[1] Huang, Yunpeng, et al. "Advancing transformer architecture in long-context large language models: A
comprehensive survey." arXiv preprint arXiv:2311.12351 (2023).
[2]J. R’Baiti, R. Faizi, Y. Hmamouche and A. E. F. Seghrouchni, "A transformer-based architecture for the automatic
detection of clickbait for Arabic headlines," 2023 5th International Conference on Natural Language Processing
(ICNLP), Guangzhou, China, 2023, pp. 248-252, doi: 10.1109/ICNLP58431.2023.00052.
[3] C. Cai, T. Zhang, Z. Weng, C. Feng and Y. Wang, "A Transformer Architecture with Adaptive Attention for
Fine-Grained Visual Classification," 2021 7th International Conference on Computer and Communications (ICCC),
Chengdu, China, 2021, pp. 863-867, doi: 10.1109/ICCC54389.2021.9674560. keywords:
[4]Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser,
Lukasz & Polosukhin, Illia. (2017). Attention Is All You Need. 10.48550/arXiv.1706.03762.
[5]P, Kleesha and Upase, Lavanya and Upadhya, Namratha R, The Evolution of Transformers Architecture in Natural
Language Processing (March 04, 2024). Available at SSRN: https://ssrn.com/abstract=4915691 or
http://dx.doi.org/10.2139/ssrn.4915691
[6]Huang, Zhaoyang, et al. "Flowformer: A transformer architecture for optical flow." European conference on
computer vision. Cham: Springer Nature Switzerland, 2022.
[7]I. Aguilera-Martos, A. Herrera-Poyatos, J. Luengo and F. Herrera, "Local Attention: Enhancing the Transformer
Architecture for Efficient Time Series Forecasting," 2024 International Joint Conference on Neural Networks
(IJCNN), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/IJCNN60899.2024.10650762.
[8]Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2024. Transformer in transformer.
In Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS '21). Curran
Associates Inc., Red Hook, NY, USA, Article 1217, 15908–15919.
[9]Yang, Shangshang & Yu, Xiaoshan & Tian, Ye & Yan, Xueming & Ma, Haiping & Zhang, Xingyi. (2023).
Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing.
[10]Xiong, Ruibin & Yang, Yunchang & Zheng, Kai & Shuxin, Zheng & Xing, Chen & Zhang, Huishuai & Lan,
Yanyan & Wang, Liwei & Liu, Tie-Yan. (2020). On Layer Normalization in the Transformer Architecture.
10.48550/arXiv.2002.04745.
[11] Sanford, Clayton & Hsu, Daniel & Telgarsky, Matus. (2024). One-layer transformers fail to solve the induction
heads task. 10.48550/arXiv.2408.14332.
[12]Gillioz, Anthony & Casas, Jacky & Mugellini, Elena & Abou Khaled, Omar. (2020). Overview of the
Transformer-based Models for NLP Tasks. 179-183. 10.15439/2020F20.
[13]Masethe, Mosima & Masethe, Dan & Ojo, Sunday & Pius, Owolawi. (2024). Paraphrase Generation Model
Using Transformer Based Architecture. SSRN Electronic Journal. 10.2139/ssrn.4683780.
[14]Khan, Shakir & Fazil, Mohd & Imoize, Agbotiname & Alabdullah, Bayan & Albahlal, Bader & Alajlan, Saad &
Almjally, Abrar & Siddiqui, Tamanna. (2023). Transformer Architecture-Based Transfer Learning for Politeness
Prediction. Sustainability. 15. 10828. 10.3390/su151410828.
[15] He, Ju & Chen, Jieneng & Liu, Shuai & Kortylewski, Adam & Yang, Cheng & Bai, Yutong & Wang, Changhu
& Yuille, Alan. (2021). TransFG: A Transformer Architecture for Fine-grained Recognition.
10.48550/arXiv.2103.07976.
[16]Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A
Comprehensive Review. Biology (Basel). 2023 Jul 22;12(7):1033. doi: 10.3390/biology12071033. PMID: 37508462;
PMCID: PMC10376273.
[17]Choi, S.R.; Lee, M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A
Comprehensive Review. Biology 2023, 12, 1033. https://doi.org/10.3390/biology12071033
[18]Min, Erxue & Chen, Runfa & Bian, Yatao An & Xu, Tingyang & Zhao, Kangfei & Huang, Wenbing & Zhao,
Peilin & Huang, Junzhou & Ananiadou, Sophia & Rong, Yu. (2022). Transformer for Graphs: An Overview from
Architecture Perspective. 10.48550/arXiv.2202.08455.
[19]Gao, Yunhe & Zhou, Mu & Metaxas, Dimitris. (2021). UTNet: A Hybrid Transformer Architecture for Medical
Image Segmentation. 10.48550/arXiv.2107.00781.
[20]Berroukham, Abdelhafid & Housni, Khalid & Lahraichi, Mohammed. (2023). Vision Transformers: A Review of
Architecture, Applications, and Future Directions. 205-210. 10.1109/CiSt56084.2023.10410015.

You might also like