Please use this identifier to cite or link to this item: doi:10.22028/D291-38852
Volltext verfügbar? / Dokumentlieferung
Title: DiscoGeM: A Crowdsourced Corpus of Genre-Mixed Implicit Discourse Relations
Author(s): Scholman, Merel Cleo Johanna
Dong, Tianai
Yung, Frances Pikyu
Demberg, Vera
Editor(s): Calzolari, Nicoletta
Language: English
Title: Language Resources and Evaluation Conference, LREC 2022, 20-25 June 2022 : Palais du Pharo, Marseille, France : conference proceedings
Pages: 3281-3290
Publisher/Platform: European Language Resources Association
Year of Publication: 2022
Place of publication: Paris
Place of the conference: Marseille, France
Free key words: discourse annotations
implicit relations
genre
crowdsourcing
label aggregation
DDC notations: 004 Computer science, internet
400 Language, linguistics
Publikation type: Conference Paper
Abstract: We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic texts. Each instance was annotated by 10 crowd workers. Various label aggregation methods were explored to evaluate how to obtain a label that best captures the meaning inferred by the crowd annotators. The results show that a significant proportion of discourse relations in DiscoGeM are ambiguous and can express multiple relation senses. Probability distribution labels better capture these interpretations than single labels. Further, the results emphasize that text genre crucially affects the distribution of discourse relations, suggesting that genre should be included as a factor in automatic relation classification. We make available the newly created DiscoGeM corpus, as well as the dataset with all annotator-level labels. Both the corpus and the dataset can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of automatic discourse relation parsers, as well as facilitate research into non-connective signals of discourse relations.
URL of the first publication: https://aclanthology.org/2022.lrec-1.351/
Link to this record: urn:nbn:de:bsz:291--ds-388529
hdl:20.500.11880/35059
http://dx.doi.org/10.22028/D291-38852
ISBN: 979-10-95546-72-6
Date of registration: 31-Jan-2023
Faculty: MI - Fakultät für Mathematik und Informatik
Department: MI - Informatik
Professorship: MI - Prof. Dr. Vera Demberg
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
There are no files associated with this item.


Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.