Please use this identifier to cite or link to this item: doi:10.22028/D291-31301
Title: Script Knowledge for Natural Language Understanding
Author(s): Ostermann, Simon
Language: English
Year of Publication: 2020
DDC notations: 004 Computer science, internet
400 Language, linguistics
Publikation type: Doctoral Thesis
Abstract: While people process text, they make frequent use of information that is assumed to be common ground and left implicit in the text. One important type of such commonsense knowledge is script knowledge, which is the knowledge about the events and participants in everyday activities such as visiting a restaurant. Due to its implicitness, it is hard for machines to exploit such script knowledge for natural language processing (NLP). This dissertation addresses the role of script knowledge in a central field of NLP, natural language understanding (NLU). In the first part of this thesis, we address script parsing. The idea of script parsing is to align event and participant mentions in a text with an underlying script representation. This makes it possible for a system to leverage script knowledge for downstream tasks. We develop the first script parsing model for events that can be trained on a large scale on crowdsourced script data. The model is implemented as a linear-chain conditional random field and trained on sequences of short event descriptions, implicitly exploiting the inherent event ordering information. We show that this ordering information plays a crucial role for script parsing. Our model provides an important first step towards facilitating the use of script knowledge for NLU. In the second part of the thesis, we move our focus to an actual application in the area of NLU, i.e. machine comprehension. For the first time, we provide data sets for the systematic evaluation of the contribution of script knowledge for machine comprehension. We create MCScript, a corpus of narrations about everyday activities and questions on the texts. By collecting questions based on a scenario rather than a text, we aimed at creating challenging questions which require script knowledge for finding the correct answer. Based on the findings of a shared task carried out with the data set, which indicated that script knowledge is not relevant for good performance on our corpus, we revised the data collection process and created a second version of the data set.
Link to this record: urn:nbn:de:bsz:291--ds-313016
hdl:20.500.11880/29334
http://dx.doi.org/10.22028/D291-31301
Advisor: Pinkal, Manfred
Date of oral examination: 19-Dec-2019
Date of registration: 29-Jun-2020
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Professorship: P - Keiner Professur zugeordnet
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
thesis.pdfPhD Thesis Simon Ostermann5,06 MBAdobe PDFView/Open


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.