Towards the extraction of cross-sentence relations through event extraction and entity coreference

Simova, Iliana

Please use this identifier to cite or link to this item: doi:10.22028/D291-35277

Title:	Towards the extraction of cross-sentence relations through event extraction and entity coreference
Author(s):	Simova, Iliana
Language:	English
Year of Publication:	2021
DDC notations:	004 Computer science, internet 400 Language, linguistics
Publikation type:	Dissertation
Abstract:	Cross-sentence relation extraction deals with the extraction of relations beyond the sentence boundary. This thesis focuses on two of the NLP tasks which are of importance to the successful extraction of cross-sentence relation mentions: event extraction and coreference resolution. The first part of the thesis focuses on addressing data sparsity issues in event extraction. We propose a self-training approach for obtaining additional labeled examples for the task. The process starts off with a Bi-LSTM event tagger trained on a small labeled data set which is used to discover new event instances in a large collection of unstructured text. The high confidence model predictions are selected to construct a data set of automatically-labeled training examples. We present several ways in which the resulting data set can be used for re-training the event tagger in conjunction with the initial labeled data. The best configuration achieves statistically significant improvement over the baseline on the ACE 2005 test set (macro-F1), as well as in a 10-fold cross validation (micro- and macro-F1) evaluation. Our error analysis reveals that the augmentation approach is especially beneficial for the classification of the most under-represented event types in the original data set. The second part of the thesis focuses on the problem of coreference resolution. While a certain level of precision can be reached by modeling surface information about entity mentions, their successful resolution often depends on semantic or world knowledge. This thesis investigates an unsupervised source of such knowledge, namely distributed word representations. We present several ways in which word embeddings can be utilized to extract features for a supervised coreference resolver. Our evaluation results and error analysis show that each of these features helps improve over the baseline coreference system’s performance, with a statistically significant improvement (CoNLL F1) achieved when the proposed features are used jointly. Moreover, all features lead to a reduction in the amount of precision errors in resolving references between common nouns, demonstrating that they successfully incorporate semantic information into the process.
Link to this record:	urn:nbn:de:bsz:291--ds-352772 hdl:20.500.11880/32255 http://dx.doi.org/10.22028/D291-35277
Advisor:	Koller, Alexander
Date of oral examination:	29-Nov-2021
Date of registration:	27-Jan-2022
Faculty:	P - Philosophische Fakultät
Department:	P - Sprachwissenschaft und Sprachtechnologie
Professorship:	P - Prof. Dr. Alexander Koller
Collections:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:

File	Description	Size	Format
Thesis_Simova.pdf		1,98 MB	Adobe PDF	View/Open

Export: BibTex