Please use this identifier to cite or link to this item:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-38844
Title: | Programmable Annotation with Diversed Heuristics and Data Denoising |
Author(s): | Chang, Ernie Marin, Alex Demberg, Vera |
Editor(s): | Scherrer, Yves |
Language: | English |
Title: | Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2022) - the 29th International Conference on Computational Linguistics : October 12-17, 2022, Gyeongju, Republic of Korea |
Pages: | 2681-2691 |
Publisher/Platform: | ACL |
Year of Publication: | 2022 |
Place of publication: | [Stroudsburg, PA] |
Place of the conference: | Gyeongju, Republic of Korea |
DDC notations: | 004 Computer science, internet |
Publikation type: | Conference Paper |
Abstract: | Neural natural language generation (NLG) and understanding (NLU) models are costly and require massive amounts of annotated data to be competitive. Recent data programming frameworks address this bottleneck by allowing human supervision to be provided as a set of labeling functions to construct generative models that synthesize weak labels at scale. However, these labeling functions are difficult to build from scratch for NLG/NLU models, as they often require complex rule sets to be specified. To this end, we propose a novel data programming framework that can jointly construct labeled data for language generation and understanding tasks – by allowing the annotators to modify an automatically-inferred alignment rule set between sequence labels and text, instead of writing rules from scratch. Further, to mitigate the effect of poor quality labels, we propose a dually-regularized denoising mechanism for optimizing the NLU and NLG models. On two benchmarks we show that the framework can generate high-quality data that comes within a 1.48 BLEU and 6.42 slot F1 of the 100% human-labeled data (42k instances) with just 100 labeled data samples – outperforming benchmark annotation frameworks and other semi-supervised approaches. |
URL of the first publication: | https://aclanthology.org/2022.coling-1.237/ |
Link to this record: | urn:nbn:de:bsz:291--ds-388443 hdl:20.500.11880/35023 http://dx.doi.org/10.22028/D291-38844 |
Date of registration: | 30-Jan-2023 |
Faculty: | MI - Fakultät für Mathematik und Informatik |
Department: | MI - Informatik |
Professorship: | MI - Prof. Dr. Vera Demberg |
Collections: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Files for this record:
There are no files associated with this item.
Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.