Крюкова, ГалинаПроцик, Олексій2020-12-052020-12-052020https://ekmair.ukma.edu.ua/handle/123456789/18999Generating high fidelity speech using a text-to-speech (TTS) system remains a challenging task despite the decades of research and investigations. Modern TTS systems are very complex. For example, it is a common practice for a statistical TTS system to have a linguistic extractor in the front, which extracts different linguistic features. It is followed by a duration model to estimate the speech length in time of a given text and an acoustic feature prediction model. Given these features, it is all fed into a vocoder, which synthesizes speech out of acoustic features. All these components are trained independently and require extensive field knowledge to be sophisticated enough and produce considerable results. Because it has a modular design, it is prone to errors which will proceed in the following modules and can accumulate.enmodelling prosodythe task of human speechsynthesisthe usemachine learningбакалаврська роботаModelling prosody in the task of human speech synthesis with the use of machine learningOther