Please use this identifier to cite or link to this item: doi:10.22028/D291-28109
Title: Augmenting Situated Spoken Language Interaction with Listener Gaze
Author(s): Mitev, Nikolina
Language: English
Year of Publication: 2019
Free key words: Spoken Interaction
DDC notations: 400 Language, linguistics
Publikation type: Dissertation
Abstract: Collaborative task solving in a shared environment requires referential success. Human speakers follow the listener’s behavior in order to monitor language comprehension (Clark, 1996). Furthermore, a natural language generation (NLG) system can exploit listener gaze to realize an effective interaction strategy by responding to it with verbal feedback in virtual environments (Garoufi, Staudte, Koller, & Crocker, 2016). We augment situated spoken language interaction with listener gaze and investigate its role in human-human and human-machine interactions. Firstly, we evaluate its impact on prediction of reference resolution using a mulitimodal corpus collection from virtual environments. Secondly, we explore if and how a human speaker uses listener gaze in an indoor guidance task, while spontaneously referring to real-world objects in a real environment. Thirdly, we consider an object identification task for assembly under system instruction. We developed a multimodal interactive system and two NLG systems that integrate listener gaze in the generation mechanisms. The NLG system “Feedback” reacts to gaze with verbal feedback, either underspecified or contrastive. The NLG system “Installments” uses gaze to incrementally refer to an object in the form of installments. Our results showed that gaze features improved the accuracy of automatic prediction of reference resolution. Further, we found that human speakers are very good at producing referring expressions, and showing listener gaze did not improve performance, but elicited more negative feedback. In contrast, we showed that an NLG system that exploits listener gaze benefits the listener’s understanding. Specifically, combining a short, ambiguous instruction with con- trastive feedback resulted in faster interactions compared to underspecified feedback, and even outperformed following long, unambiguous instructions. Moreover, alternating the underspecified and contrastive responses in an interleaved manner led to better engagement with the system and an effcient information uptake, and resulted in equally good performance. Somewhat surprisingly, when gaze was incorporated more indirectly in the generation procedure and used to trigger installments, the non-interactive approach that outputs an instruction all at once was more effective. However, if the spatial expression was mentioned first, referring in gaze-driven installments was as efficient as following an exhaustive instruction. In sum, we provide a proof of concept that listener gaze can effectively be used in situated human-machine interaction. An assistance system using gaze cues is more attentive and adapts to listener behavior to ensure communicative success.
Link to this record: urn:nbn:de:bsz:291--ds-281093
hdl:20.500.11880/27468
http://dx.doi.org/10.22028/D291-28109
Advisor: Staudte, Maria
Date of oral examination: 6-Jun-2019
Date of registration: 5-Jul-2019
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
mitev_nikolina_phd_thesis_veröffentlich_compressed.pdf10,01 MBAdobe PDFView/Open


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.