Please use this identifier to cite or link to this item:
Volltext verfügbar? / Dokumentlieferung
doi:10.22028/D291-42312
Title: | Revisiting Sample Size Determination in Natural Language Understanding |
Author(s): | Chang, Ernie Hassan Rashid, Muhammad Lin, Pin-Jie Zhao, Changsheng Demberg, Vera Shi, Yangyang Chandra, Vikas |
Editor(s): | Rogers, Anna |
Language: | English |
Title: | Findings of the Association for Computational Linguistics: ACL 2023 : July 9-14, 2023 : ACL 2023 |
Pages: | 6716-6724 |
Publisher/Platform: | ACL |
Year of Publication: | 2023 |
Place of publication: | Stroudsburg, PA |
Place of the conference: | Toronto, Canada |
DDC notations: | 004 Computer science, internet 400 Language, linguistics |
Publikation type: | Conference Paper |
Abstract: | Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data. |
Link to this record: | urn:nbn:de:bsz:291--ds-423120 hdl:20.500.11880/37984 http://dx.doi.org/10.22028/D291-42312 |
ISBN: | 978-1-959429-62-3 |
Date of registration: | 1-Jul-2024 |
Faculty: | MI - Fakultät für Mathematik und Informatik |
Department: | MI - Informatik |
Professorship: | MI - Prof. Dr. Vera Demberg |
Collections: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Files for this record:
There are no files associated with this item.
Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.