Please use this identifier to cite or link to this item: doi:10.22028/D291-38842
Volltext verfügbar? / Dokumentlieferung
Files for this record:
There are no files associated with this item.
Title: Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning
Author(s): Chang, Ernie
Alabi, Jesujoba
Adelani, David Ifeoluwa UdsID
Demberg, Vera UdsID
Editor(s): Scherrer, Yves
Language: English
In:
Title: Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2022) - the 29th International Conference on Computational Linguistics : October 12-17, 2022, Gyeongju, Republic of Korea
Pages: 4286-4291
Publisher/Platform: ACL
Year of Publication: 2022
Place of publication: [Stroudsburg, PA]
Place of the conference: Gyeongju, Republic of Korea
DDC notations: 400 Language, linguistics
Publikation type: Conference Paper
Abstract: The surging demand for multilingual dialogue systems often requires a costly labeling process for each language addition. For low resource languages, human annotators are continuously tasked with the adaptation of resource-rich language utterances for each new domain. However, this prohibitive and impractical process can often be a bottleneck for low resource languages that are still without proper translation systems nor parallel corpus. In particular, it is difficult to obtain task-specific low resource language annotations for the English-derived creoles (e.g. Nigerian and Cameroonian Pidgin). To address this issue, we utilize the pretrained language models i.e. BART which has shown great potential in language generation/understanding – we propose to finetune the BART model to generate utterances in Pidgin by leveraging the proximity of the source and target languages, and utilizing positive and negative examples in constrastive training objectives. We collected and released the first parallel Pidgin-English conversation corpus in two dialogue domains and showed that this simple and effective technique is suffice to yield impressive results for English-to-Pidgin generation, which are two closely-related languages.
URL of the first publication: https://aclanthology.org/2022.coling-1.377/
Link to this record: urn:nbn:de:bsz:291--ds-388423
hdl:20.500.11880/35022
http://dx.doi.org/10.22028/D291-38842
Date of registration: 30-Jan-2023
Faculty: MI - Fakultät für Mathematik und Informatik
Department: MI - Informatik
Professorship: MI - Prof. Dr. Vera Demberg
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes



Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.