Please use this identifier to cite or link to this item:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-42312
Files for this record:
There are no files associated with this item.
Title: | Revisiting Sample Size Determination in Natural Language Understanding |
Author(s): | Chang, Ernie Hassan Rashid, Muhammad Lin, Pin-Jie Zhao, Changsheng Demberg, Vera ![]() Shi, Yangyang Chandra, Vikas |
Editor(s): | Rogers, Anna |
Language: | English |
In: | |
Title: | Findings of the Association for Computational Linguistics: ACL 2023 : July 9-14, 2023 : ACL 2023 |
Pages: | 6716-6724 |
Publisher/Platform: | ACL |
Year of Publication: | 2023 |
Place of publication: | Stroudsburg, PA |
Place of the conference: | Toronto, Canada |
DDC notations: | 004 Computer science, internet 400 Language, linguistics |
Publikation type: | Conference Paper |
Abstract: | Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data. |
Link to this record: | urn:nbn:de:bsz:291--ds-423120 hdl:20.500.11880/37984 http://dx.doi.org/10.22028/D291-42312 |
ISBN: | 978-1-959429-62-3 |
Date of registration: | 1-Jul-2024 |
Faculty: | MI - Fakultät für Mathematik und Informatik |
Department: | MI - Informatik |
Professorship: | MI - Prof. Dr. Vera Demberg |
Collections: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.