Please use this identifier to cite or link to this item: doi:10.22028/D291-42312
Volltext verfügbar? / Dokumentlieferung
Files for this record:
There are no files associated with this item.
Title: Revisiting Sample Size Determination in Natural Language Understanding
Author(s): Chang, Ernie
Hassan Rashid, Muhammad
Lin, Pin-Jie
Zhao, Changsheng
Demberg, Vera UdsID
Shi, Yangyang
Chandra, Vikas
Editor(s): Rogers, Anna
Language: English
In:
Title: Findings of the Association for Computational Linguistics: ACL 2023 : July 9-14, 2023 : ACL 2023
Pages: 6716-6724
Publisher/Platform: ACL
Year of Publication: 2023
Place of publication: Stroudsburg, PA
Place of the conference: Toronto, Canada
DDC notations: 004 Computer science, internet
400 Language, linguistics
Publikation type: Conference Paper
Abstract: Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data.
Link to this record: urn:nbn:de:bsz:291--ds-423120
hdl:20.500.11880/37984
http://dx.doi.org/10.22028/D291-42312
ISBN: 978-1-959429-62-3
Date of registration: 1-Jul-2024
Faculty: MI - Fakultät für Mathematik und Informatik
Department: MI - Informatik
Professorship: MI - Prof. Dr. Vera Demberg
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes



Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.