Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

Abdullah, Badr M.; Avgustinova, Tania; Möbius, Bernd; Klakow, Dietrich

Bitte benutzen Sie diese Referenz, um auf diese Ressource zu verweisen: doi:10.22028/D291-31932

Volltext verfügbar? / Dokumentlieferung

Titel:	Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
VerfasserIn:	Abdullah, Badr M. Avgustinova, Tania Möbius, Bernd Klakow, Dietrich
Sprache:	Englisch
Titel:	Cognitive intelligence for speech processing : 21st Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) : held online due to Covid-19 : Shanghai, China, 25-29 October 2020
Startseite:	477
Endseite:	481
Verlag/Plattform:	Curran Associates, Inc.
Erscheinungsjahr:	2020
Erscheinungsort:	Red Hook, NY
Titel der Konferenz:	Interspeech 2020
Konferenzort:	Shanghai, China
Dokumenttyp:	Konferenzbeitrag (in einem Konferenzband / InProceedings erschienener Beitrag)
Abstract:	State-of-the-art spoken language identification (LID) systems, which are based on end-to-end deep neural networks, have shown remarkable success not only in discriminating between distant languages but also between closely-related languages or even different spoken varieties of the same language. However, it is still unclear to what extent neural LID models generalize to speech samples with different acoustic conditions due to domain shift. In this paper, we present a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains (read speech and radio broadcast) and examine two low-level signal descriptors (spectral and cepstral features) for this task. Our experiments show that (1) out-of-domain speech samples severely hinder the performance of neural LID models, and (2) while both spectral and cepstral features show comparable performance within-domain, spectral features show more robustness under domain mismatch. Moreover, we apply unsupervised domain adaptation to minimize the discrepancy between the two domains in our study. We achieve relative accuracy improvements that range from 9% to 77% depending on the diversity of acoustic conditions in the source domain.
DOI der Erstveröffentlichung:	10.21437/Interspeech.2020-2930
URL der Erstveröffentlichung:	https://www.isca-speech.org/archive/Interspeech_2020/abstracts/2930.html
Link zu diesem Datensatz:	hdl:20.500.11880/30651 http://dx.doi.org/10.22028/D291-31932
ISBN:	978-1-7138-2069-7
Datum des Eintrags:	17-Feb-2021
Bemerkung/Hinweis:	Volume 1
Fakultät:	P - Philosophische Fakultät
Fachrichtung:	P - Sprachwissenschaft und Sprachtechnologie
Professur:	P - Prof. Dr. Bernd Möbius
Sammlung:	SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Dateien zu diesem Datensatz:

Es gibt keine Dateien zu dieser Ressource.

Export: BibTex Statistik anzeigen

Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.