Information Development , 01/01/2025
The power of multimodality: Improving speech act classification through visual cues, audio intonation, and gaze tracking
Abstract
Multimodal discourse analysis enhances precision and contextual understanding in speech acts by integrating modalities such as visual cues, text, non-verbal signals, and gaze tracking. This study explores the effectiveness of multimodal discourse in improving speech act classification through combined visual, auditory, and non-verbal data. A mixed-method approach was employed, involving quantitative data from 370 communication professionals analyzed using Statistical Package For Social Sciences (SPSS), alongside qualitative insights from interviews and focus groups. Findings indicate that visual cues significantly enhance speech act classification performance, while audio intonation improves accuracy under noisy conditions. The integration of text and non-verbal data further supports deeper contextual understanding, particularly benefiting indirect speech act recognition and overall multimodal fusion effectiveness. This study's holistic approach uniquely combines multiple modalities, visual, audio, text, and gaze tracking, surpassing previous research focused on isolated speech interpretation factors. Multimodality significantly improves accuracy and contextual comprehension in speech act classification, demonstrating that communication analysis should extend beyond textual content to include audio and non-verbal traits for a fuller understanding.
Document Type
Article
Source Type
Journal
Keywords
audio annotationcommunicationgaze trackingmultimodal analysisnon-verbalspeech actvisual cues
ASJC Subject Area
Social Sciences : Library and Information Sciences