Modern approaches to controllable emotional speech synthesis
| dc.contributor.author | Ivashchenko, Dmytro | en_US |
| dc.contributor.author | Marchenko, Oleksandr | en_US |
| dc.date.accessioned | 2026-02-04T08:23:24Z | |
| dc.date.available | 2026-02-04T08:23:24Z | |
| dc.date.issued | 2025 | |
| dc.description | У статті представлено комплексний огляд сучасних технологій керованих систем для емоційного синтезу мовлення. Проаналізовано еволюцію нейронних архітектур, систематизовано підходи за технологіями та методами емоційного контролю. Визначено ключові виклики галузі, що охоплюють відокремлення мовленнєвих ознак та дефіцит даних для мов з обмеженими ресурсами. Окреслено перспективні напрями розвитку систем емоційно контрольованого синтезу мовлення. | uk_UA |
| dc.description.abstract | The generation of emotionally expressive and controllable speech is one of the most dynamic and technically demanding areas in the intersection of artificial intelligence, natural language processing, and speech synthesis. Recent progress in emotional text-to-speech (TTS) systems has enabled increasingly natural and emotionally nuanced voice generation, shifting from early concatenative methods to advanced neural models. This review provides a structured overview of the state of the art in controllable emotional TTS, highlighting key architectural paradigms. A special focus is placed on emotional control mechanisms, including discrete emotional tagging with categorical or dimensional labels, reference-based control which conditions synthesis on prosodic or stylistic exemplars, and prompt-based techniques that leverage the capabilities of large language models for flexible and intuitive emotional specification. Despite substantial improvements in synthesis quality and emotional expressiveness, several critical challenges remain unresolved. These include the disentanglement of emotional, speaker, and prosodic features, the lack of standardized evaluation metrics for emotional clarity and naturalness, and the significant computational demands associated with training high-fidelity models. Furthermore, the scarcity of diverse and emotion-labeled speech data, especially for low-resource and morphologically rich languages, continues to limit the generalizability of current approaches. This review not only summarizes existing methods and their trade-offs but also outlines promising research directions, aiming to support the development of more robust, efficient, and emotionally expressive speech generation systems. | en_US |
| dc.identifier.citation | 111 | en_US |
| dc.identifier.issn | 2617-3808 | |
| dc.identifier.issn | 2617-7323 | |
| dc.identifier.uri | https://doi.org/10.18523/2617-3808.2025.8.28-37 | |
| dc.identifier.uri | https://ekmair.ukma.edu.ua/handle/123456789/38254 | |
| dc.language.iso | en | en_US |
| dc.relation.source | Наукові записки НаУКМА. Комп'ютерні науки | uk_UA |
| dc.status | first published | en_US |
| dc.subject | deep learning | en_US |
| dc.subject | text-to-speech synthesis | en_US |
| dc.subject | natural language processing | en_US |
| dc.subject | speech emotion control | en_US |
| dc.subject | diffusion models | en_US |
| dc.subject | article | en_US |
| dc.subject | глибоке навчання | uk_UA |
| dc.subject | синтез мовлення з тексту | uk_UA |
| dc.subject | обробка природної мови | uk_UA |
| dc.subject | емоційний контроль мовлення | uk_UA |
| dc.subject | дифузійні моделі | uk_UA |
| dc.title | Modern approaches to controllable emotional speech synthesis | en_US |
| dc.title.alternative | Сучасні підходи до контрольованого синтезу емоційного мовлення | uk_UA |
| dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Ivashchenko_Marchenko_Modern_approaches_to_controllable_emotional_speech_synthesis.pdf
- Size:
- 823.85 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: