Modelling prosody in the task of human speech synthesis with the use of machine learning

Loading...
Thumbnail Image
Date
2020
Authors
Процик, Олексій
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Generating high fidelity speech using a text-to-speech (TTS) system remains a challenging task despite the decades of research and investigations. Modern TTS systems are very complex. For example, it is a common practice for a statistical TTS system to have a linguistic extractor in the front, which extracts different linguistic features. It is followed by a duration model to estimate the speech length in time of a given text and an acoustic feature prediction model. Given these features, it is all fed into a vocoder, which synthesizes speech out of acoustic features. All these components are trained independently and require extensive field knowledge to be sophisticated enough and produce considerable results. Because it has a modular design, it is prone to errors which will proceed in the following modules and can accumulate.
Description
Keywords
modelling prosody, the task of human speech, synthesis, the use, machine learning, бакалаврська робота
Citation