Towards the extraction of cross-sentence relations through event extraction and entity coreference

Simova, Iliana

Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-35277

Titel:	Towards the extraction of cross-sentence relations through event extraction and entity coreference
VerfasserIn:	Simova, Iliana
Sprache:	Englisch
Erscheinungsjahr:	2021
DDC-Sachgruppe:	004 Informatik 400 Sprache, Linguistik
Dokumenttyp:	Dissertation
Abstract:	Cross-sentence relation extraction deals with the extraction of relations beyond the sentence boundary. This thesis focuses on two of the NLP tasks which are of importance to the successful extraction of cross-sentence relation mentions: event extraction and coreference resolution. The first part of the thesis focuses on addressing data sparsity issues in event extraction. We propose a self-training approach for obtaining additional labeled examples for the task. The process starts off with a Bi-LSTM event tagger trained on a small labeled data set which is used to discover new event instances in a large collection of unstructured text. The high confidence model predictions are selected to construct a data set of automatically-labeled training examples. We present several ways in which the resulting data set can be used for re-training the event tagger in conjunction with the initial labeled data. The best configuration achieves statistically significant improvement over the baseline on the ACE 2005 test set (macro-F1), as well as in a 10-fold cross validation (micro- and macro-F1) evaluation. Our error analysis reveals that the augmentation approach is especially beneficial for the classification of the most under-represented event types in the original data set. The second part of the thesis focuses on the problem of coreference resolution. While a certain level of precision can be reached by modeling surface information about entity mentions, their successful resolution often depends on semantic or world knowledge. This thesis investigates an unsupervised source of such knowledge, namely distributed word representations. We present several ways in which word embeddings can be utilized to extract features for a supervised coreference resolver. Our evaluation results and error analysis show that each of these features helps improve over the baseline coreference system’s performance, with a statistically significant improvement (CoNLL F1) achieved when the proposed features are used jointly. Moreover, all features lead to a reduction in the amount of precision errors in resolving references between common nouns, demonstrating that they successfully incorporate semantic information into the process.
Link zu diesem Datensatz:	urn:nbn:de:bsz:291--ds-352772 hdl:20.500.11880/32255 http://dx.doi.org/10.22028/D291-35277
Erstgutachter:	Koller, Alexander
Tag der mündlichen Prüfung:	29-Nov-2021
Datum des Eintrags:	27-Jan-2022
Fakultät:	P - Philosophische Fakultät
Fachrichtung:	P - Sprachwissenschaft und Sprachtechnologie
Professur:	P - Prof. Dr. Alexander Koller
Sammlung:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:

Datei	Beschreibung	Größe	Format
Thesis_Simova.pdf		1,98 MB	Adobe PDF	Öffnen/Anzeigen

Export: BibTex Statistik anzeigen

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.