WURISWebSearch

Information Development , 01/01/2025
The power of multimodality: Improving speech act classification through visual cues, audio intonation, and gaze tracking

Ying Li, Wari Wongwaropakorn

Abstract

Multimodal discourse analysis enhances precision and contextual understanding in speech acts by integrating modalities such as visual cues, text, non-verbal signals, and gaze tracking. This study explores the effectiveness of multimodal discourse in improving speech act classification through combined visual, auditory, and non-verbal data. A mixed-method approach was employed, involving quantitative data from 370 communication professionals analyzed using Statistical Package For Social Sciences (SPSS), alongside qualitative insights from interviews and focus groups. Findings indicate that visual cues significantly enhance speech act classification performance, while audio intonation improves accuracy under noisy conditions. The integration of text and non-verbal data further supports deeper contextual understanding, particularly benefiting indirect speech act recognition and overall multimodal fusion effectiveness. This study's holistic approach uniquely combines multiple modalities, visual, audio, text, and gaze tracking, surpassing previous research focused on isolated speech interpretation factors. Multimodality significantly improves accuracy and contextual comprehension in speech act classification, demonstrating that communication analysis should extend beyond textual content to include audio and non-verbal traits for a fuller understanding.

Document Type

Article

Source Type

Journal

Keywords

audio annotationcommunicationgaze trackingmultimodal analysisnon-verbalspeech actvisual cues

ASJC Subject Area

Social Sciences : Library and Information Sciences

Access to Document

DOI : 10.1177/02666669251355940
Link to scopus

0
Citations (Scopus)