Triple Attention Network architecture for MovieQA

Shah, Ankit; Lin, Tzu-Hsiang; Wu, Shijie

Computer Science > Multimedia

arXiv:2111.09531 (cs)

[Submitted on 18 Nov 2021]

Title:Triple Attention Network architecture for MovieQA

Authors:Ankit Shah, Tzu-Hsiang Lin, Shijie Wu

View PDF

Abstract:Movie question answering, or MovieQA is a multimedia related task wherein one is provided with a video, the subtitle information, a question and candidate answers for it. The task is to predict the correct answer for the question using the components of the multimedia - namely video/images, audio and text. Traditionally, MovieQA is done using the image and text component of the multimedia. In this paper, we propose a novel network with triple-attention architecture for the inclusion of audio in the Movie QA task. This architecture is fashioned after a traditional dual attention network focused only on video and text. Experiments show that the inclusion of audio using the triple-attention network results provides complementary information for Movie QA task which is not captured by visual or textual component in the data. Experiments with a wide range of audio features show that using such a network can indeed improve MovieQA performance by about 7% relative to just using only visual features.

Subjects:	Multimedia (cs.MM)
Cite as:	arXiv:2111.09531 [cs.MM]
	(or arXiv:2111.09531v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2111.09531

Submission history

From: Ankit Parag Shah [view email]
[v1] Thu, 18 Nov 2021 05:45:23 UTC (832 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.MM

< prev | next >

new | recent | 2021-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ankit Shah
Tzu-Hsiang Lin
Shijie Wu

export BibTeX citation

Computer Science > Multimedia

Title:Triple Attention Network architecture for MovieQA

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Triple Attention Network architecture for MovieQA

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators