Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-38844
Volltext verfügbar? / Dokumentlieferung
Titel: Programmable Annotation with Diversed Heuristics and Data Denoising
VerfasserIn: Chang, Ernie
Marin, Alex
Demberg, Vera
HerausgeberIn: Scherrer, Yves
Sprache: Englisch
Titel: Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2022) - the 29th International Conference on Computational Linguistics : October 12-17, 2022, Gyeongju, Republic of Korea
Seiten: 2681-2691
Verlag/Plattform: ACL
Erscheinungsjahr: 2022
Erscheinungsort: [Stroudsburg, PA]
Konferenzort: Gyeongju, Republic of Korea
DDC-Sachgruppe: 004 Informatik
Dokumenttyp: Konferenzbeitrag (in einem Konferenzband / InProceedings erschienener Beitrag)
Abstract: Neural natural language generation (NLG) and understanding (NLU) models are costly and require massive amounts of annotated data to be competitive. Recent data programming frameworks address this bottleneck by allowing human supervision to be provided as a set of labeling functions to construct generative models that synthesize weak labels at scale. However, these labeling functions are difficult to build from scratch for NLG/NLU models, as they often require complex rule sets to be specified. To this end, we propose a novel data programming framework that can jointly construct labeled data for language generation and understanding tasks – by allowing the annotators to modify an automatically-inferred alignment rule set between sequence labels and text, instead of writing rules from scratch. Further, to mitigate the effect of poor quality labels, we propose a dually-regularized denoising mechanism for optimizing the NLU and NLG models. On two benchmarks we show that the framework can generate high-quality data that comes within a 1.48 BLEU and 6.42 slot F1 of the 100% human-labeled data (42k instances) with just 100 labeled data samples – outperforming benchmark annotation frameworks and other semi-supervised approaches.
URL der Erstveröffentlichung: https://aclanthology.org/2022.coling-1.237/
Link zu diesem Datensatz: urn:nbn:de:bsz:291--ds-388443
hdl:20.500.11880/35023
http://dx.doi.org/10.22028/D291-38844
Datum des Eintrags: 30-Jan-2023
Fakultät: MI - Fakultät für Mathematik und Informatik
Fachrichtung: MI - Informatik
Professur: MI - Prof. Dr. Vera Demberg
Sammlung:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:
Es gibt keine Dateien zu dieser Ressource.


Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.