Please use this identifier to cite or link to this item: doi:10.22028/D291-27237
Title: A dynamic deep learning approach for intonation modeling
Author(s): Tombini, Francesco
Language: English
Year of Publication: 2018
Place of publication: Saarbrücken
DDC notations: 600 Technology
Publikation type: Other
Abstract: Intonation plays a crucial role in making synthetic speech sound more natural. However, intonation modeling largely remains an open question. In my thesis, the interpolated F0 is parameterized dynamically by means of sign values, encoding the direction of pitch change, and corresponding quantized magnitude values, encoding the amount of pitch change in such direction. The sign and magnitude values are used for the training of a dedicated neural network. The proposed methodology is evaluated and compared to a state-of-the-art DNN-based TTS system. To this end, a segmental synthesizer was implemented to normalize the effect of the spectrum. The synthesizer uses the F0 and linguistic features to predict the spectrum, aperiodicity, and voicing information. The proposed methodology performs as well as the reference system, and we observe a trend for native speakers to prefer the proposed intonation model.
Link to this record: urn:nbn:de:bsz:291-scidok-ds-272375
Date of registration: 16-Jul-2018
Faculty: P - Philosophische Fakultät
Department: P - Sprachwissenschaft und Sprachtechnologie
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
thesis_latex.pdf960,56 kBAdobe PDFView/Open

Items in SciDok are protected by copyright, with all rights reserved, unless otherwise indicated.