Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-38844
Titel: | Programmable Annotation with Diversed Heuristics and Data Denoising |
VerfasserIn: | Chang, Ernie Marin, Alex Demberg, Vera |
HerausgeberIn: | Scherrer, Yves |
Sprache: | Englisch |
Titel: | Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2022) - the 29th International Conference on Computational Linguistics : October 12-17, 2022, Gyeongju, Republic of Korea |
Seiten: | 2681-2691 |
Verlag/Plattform: | ACL |
Erscheinungsjahr: | 2022 |
Erscheinungsort: | [Stroudsburg, PA] |
Konferenzort: | Gyeongju, Republic of Korea |
DDC-Sachgruppe: | 004 Informatik |
Dokumenttyp: | Konferenzbeitrag (in einem Konferenzband / InProceedings erschienener Beitrag) |
Abstract: | Neural natural language generation (NLG) and understanding (NLU) models are costly and require massive amounts of annotated data to be competitive. Recent data programming frameworks address this bottleneck by allowing human supervision to be provided as a set of labeling functions to construct generative models that synthesize weak labels at scale. However, these labeling functions are difficult to build from scratch for NLG/NLU models, as they often require complex rule sets to be specified. To this end, we propose a novel data programming framework that can jointly construct labeled data for language generation and understanding tasks – by allowing the annotators to modify an automatically-inferred alignment rule set between sequence labels and text, instead of writing rules from scratch. Further, to mitigate the effect of poor quality labels, we propose a dually-regularized denoising mechanism for optimizing the NLU and NLG models. On two benchmarks we show that the framework can generate high-quality data that comes within a 1.48 BLEU and 6.42 slot F1 of the 100% human-labeled data (42k instances) with just 100 labeled data samples – outperforming benchmark annotation frameworks and other semi-supervised approaches. |
URL der Erstveröffentlichung: | https://aclanthology.org/2022.coling-1.237/ |
Link zu diesem Datensatz: | urn:nbn:de:bsz:291--ds-388443 hdl:20.500.11880/35023 http://dx.doi.org/10.22028/D291-38844 |
Datum des Eintrags: | 30-Jan-2023 |
Fakultät: | MI - Fakultät für Mathematik und Informatik |
Fachrichtung: | MI - Informatik |
Professur: | MI - Prof. Dr. Vera Demberg |
Sammlung: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Dateien zu diesem Datensatz:
Es gibt keine Dateien zu dieser Ressource.
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.