Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-31301
Titel: Script Knowledge for Natural Language Understanding
VerfasserIn: Ostermann, Simon
Sprache: Englisch
Erscheinungsjahr: 2020
DDC-Sachgruppe: 004 Informatik
400 Sprache, Linguistik
Dokumenttyp: Dissertation
Abstract: While people process text, they make frequent use of information that is assumed to be common ground and left implicit in the text. One important type of such commonsense knowledge is script knowledge, which is the knowledge about the events and participants in everyday activities such as visiting a restaurant. Due to its implicitness, it is hard for machines to exploit such script knowledge for natural language processing (NLP). This dissertation addresses the role of script knowledge in a central field of NLP, natural language understanding (NLU). In the first part of this thesis, we address script parsing. The idea of script parsing is to align event and participant mentions in a text with an underlying script representation. This makes it possible for a system to leverage script knowledge for downstream tasks. We develop the first script parsing model for events that can be trained on a large scale on crowdsourced script data. The model is implemented as a linear-chain conditional random field and trained on sequences of short event descriptions, implicitly exploiting the inherent event ordering information. We show that this ordering information plays a crucial role for script parsing. Our model provides an important first step towards facilitating the use of script knowledge for NLU. In the second part of the thesis, we move our focus to an actual application in the area of NLU, i.e. machine comprehension. For the first time, we provide data sets for the systematic evaluation of the contribution of script knowledge for machine comprehension. We create MCScript, a corpus of narrations about everyday activities and questions on the texts. By collecting questions based on a scenario rather than a text, we aimed at creating challenging questions which require script knowledge for finding the correct answer. Based on the findings of a shared task carried out with the data set, which indicated that script knowledge is not relevant for good performance on our corpus, we revised the data collection process and created a second version of the data set.
Link zu diesem Datensatz: urn:nbn:de:bsz:291--ds-313016
hdl:20.500.11880/29334
http://dx.doi.org/10.22028/D291-31301
Erstgutachter: Pinkal, Manfred
Tag der mündlichen Prüfung: 19-Dez-2019
Datum des Eintrags: 29-Jun-2020
Fakultät: P - Philosophische Fakultät
Fachrichtung: P - Sprachwissenschaft und Sprachtechnologie
Professur: P - Keiner Professur zugeordnet
Sammlung:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:
Datei Beschreibung GrößeFormat 
thesis.pdfPhD Thesis Simon Ostermann5,06 MBAdobe PDFÖffnen/Anzeigen


Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.