Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen:
doi:10.22028/D291-31301
Titel: | Script Knowledge for Natural Language Understanding |
VerfasserIn: | Ostermann, Simon |
Sprache: | Englisch |
Erscheinungsjahr: | 2020 |
DDC-Sachgruppe: | 004 Informatik 400 Sprache, Linguistik |
Dokumenttyp: | Dissertation |
Abstract: | While people process text, they make frequent use of information that is assumed to be common ground and left implicit in the text. One important type of such commonsense knowledge is script knowledge, which is the knowledge about the events and participants in everyday activities such as visiting a restaurant. Due to its implicitness, it is hard for machines to exploit such script knowledge for natural language processing (NLP). This dissertation addresses the role of script knowledge in a central field of NLP, natural language understanding (NLU). In the first part of this thesis, we address script parsing. The idea of script parsing is to align event and participant mentions in a text with an underlying script representation. This makes it possible for a system to leverage script knowledge for downstream tasks. We develop the first script parsing model for events that can be trained on a large scale on crowdsourced script data. The model is implemented as a linear-chain conditional random field and trained on sequences of short event descriptions, implicitly exploiting the inherent event ordering information. We show that this ordering information plays a crucial role for script parsing. Our model provides an important first step towards facilitating the use of script knowledge for NLU. In the second part of the thesis, we move our focus to an actual application in the area of NLU, i.e. machine comprehension. For the first time, we provide data sets for the systematic evaluation of the contribution of script knowledge for machine comprehension. We create MCScript, a corpus of narrations about everyday activities and questions on the texts. By collecting questions based on a scenario rather than a text, we aimed at creating challenging questions which require script knowledge for finding the correct answer. Based on the findings of a shared task carried out with the data set, which indicated that script knowledge is not relevant for good performance on our corpus, we revised the data collection process and created a second version of the data set. |
Link zu diesem Datensatz: | urn:nbn:de:bsz:291--ds-313016 hdl:20.500.11880/29334 http://dx.doi.org/10.22028/D291-31301 |
Erstgutachter: | Pinkal, Manfred |
Tag der mündlichen Prüfung: | 19-Dez-2019 |
Datum des Eintrags: | 29-Jun-2020 |
Fakultät: | P - Philosophische Fakultät |
Fachrichtung: | P - Sprachwissenschaft und Sprachtechnologie |
Professur: | P - Keiner Professur zugeordnet |
Sammlung: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Dateien zu diesem Datensatz:
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
thesis.pdf | PhD Thesis Simon Ostermann | 5,06 MB | Adobe PDF | Öffnen/Anzeigen |
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.