Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images | ||
| AUT Journal of Electrical Engineering | ||
| مقاله 4، دوره 56، شماره 2، 2024، صفحه 191-202 اصل مقاله (1.1 M) | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22060/eej.2023.22605.5552 | ||
| نویسندگان | ||
| Erfan Zolghadriha1؛ Kazim Fouladi-Ghaleh* 2؛ Pouya Ardehkhani1 | ||
| 1Deep Learning Research Lab, Department of Computer Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Iran | ||
| 2Department of Computer Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Iran | ||
| چکیده | ||
| In specialized fields, the accurate answering of visual questions is crucial for practical applications, and this study focuses on improving a visual question-answering model for artistic images by utilizing a dataset with both visual and knowledge-based questions. The approach involves employing a pre-trained BERT model to understand query nature and using the iQAN model with MLB and MUTAN mechanisms for visual queries, along with an XLNet-based model for knowledge-based information. The results demonstrate a 78.92% accuracy for visual questions, 47.71% for knowledge-based questions, and an overall accuracy of 55.88% by combining both branches. Additionally, the research explores the impact of parameters like the number of glances and activation functions on the model's performance. | ||
| کلیدواژهها | ||
| Art Pictures؛ Visual Question Answering (VQA)؛ Natural Language Processing (NLP)؛ Computer Vision؛ Attention | ||
| مراجع | ||
|
| ||
|
آمار تعداد مشاهده مقاله: 619 تعداد دریافت فایل اصل مقاله: 555 |
||
| تعداد نشریات | 9 |
| تعداد شمارهها | 455 |
| تعداد مقالات | 5,771 |
| تعداد مشاهده مقاله | 8,383,010 |
| تعداد دریافت فایل اصل مقاله | 6,941,538 |