Please use this identifier to cite or link to this item:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-38852
Title: | DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations |
Author(s): | Scholman, Merel Cleo Johanna Dong, Tianai Yung, Frances Pikyu Demberg, Vera |
Editor(s): | Calzolari, Nicoletta |
Language: | English |
Title: | Language Resources and Evaluation Conference, LREC 2022, 20-25 June 2022 : Palais du Pharo, Marseille, France : conference proceedings |
Pages: | 3281-3290 |
Publisher/Platform: | European Language Resources Association |
Year of Publication: | 2022 |
Place of publication: | Paris |
Place of the conference: | Marseille, France |
Free key words: | discourse annotations implicit relations genre crowdsourcing label aggregation |
DDC notations: | 004 Computer science, internet 400 Language, linguistics |
Publikation type: | Conference Paper |
Abstract: | We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations. |
URL of the first publication: | https://aclanthology.org/2022.lrec-1.351/ |
Link to this record: | urn:nbn:de:bsz:291--ds-388529 hdl:20.500.11880/35059 http://dx.doi.org/10.22028/D291-38852 |
ISBN: | 979-10-95546-72-6 |
Date of registration: | 31-Jan-2023 |
Faculty: | MI - Fakultät für Mathematik und Informatik |
Department: | MI - Informatik |
Professorship: | MI - Prof. Dr. Vera Demberg |
Collections: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Files for this record:
There are no files associated with this item.
Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.