MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering

Pan, Haiwei; He, Shuning; Zhang, Kejia; Qu, Bo; Chen, Chunling; Shi, Kun

Abstract:Medical Visual Question Answering (VQA) is a multi-modal challenging task widely considered by research communities of the computer vision and natural language processing. Since most current medical VQA models focus on visual content, ignoring the importance of text, this paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering which integrates the high-level semantics of medical images on the basis of text description. Firstly, different methods are utilized to extract the features of the image and the question for the two modalities of vision and text. Secondly, this paper proposes a multi-view attention mechanism that include Image-to-Question (I2Q) attention and Word-to-Text (W2T) attention. Multi-view attention can correlate the question with image and word in order to better analyze the question and get an accurate answer. Thirdly, a composite loss is presented to predict the answer accurately after multi-modal feature fusion and improve the similarity between visual and textual cross-modal features. It consists of classification loss and image-question complementary (IQC) loss. Finally, for data errors and missing labels in the VQA-RAD dataset, we collaborate with medical experts to correct and complete this dataset and then construct an enhanced dataset, VQA-RADPh. The experiments on these two datasets show that the effectiveness of MuVAM surpasses the state-of-the-art method.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2107.03216 [cs.CV]
	(or arXiv:2107.03216v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2107.03216

Computer Science > Computer Vision and Pattern Recognition

Title:MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering

Submission history

Access Paper:

Current browse context:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators