Real-Time Multilingual Video Subtitle Spotting on Resource-Constrained Devices

Degtyarenko, IllyaRadyvonenko, OlgaTkach, NazariiSielikhov, ValeriiSeliuk, KostiantynIvanov, Oleksandr2026-05-202026-05-202025Real-Time Multilingual Video Subtitle Spotting on Resource-Constrained Devices / Illya Degtyarenko, Olga Radyvonenko, Nazarii Tkach, Valerii Sielikhov, Kostiantyn Seliuk, Oleksandr Ivanov // IEEE Access. - 2025. - Vol. 13. - P. 211701-211714. - https://doi.org/10.1109/ACCESS.2025.36435122169-3536https://doi.org/10.1109/ACCESS.2025.3643512https://ekmair.ukma.edu.ua/handle/123456789/39575Accurate real-time on-device video subtitle spotting is essential for many applications, such as subtitle translation, text-to-speech conversion, video content comprehension. However, most video content providers encrypt or embed subtitles in ways that prevent direct text extraction, necessitating the use of Optical Character Recognition (OCR) for detection and recognition. Current state-of-the-art video text spotting methods are not optimized for real-time operation on edge devices. To address this challenge, this study introduces the specialized neural network architectures designed for on-device video content classification, subtitle tracking, detection and recognition. To enhance efficiency, the proposed neural network architectures employ advanced optimization techniques, including pruning and Quantization-Aware Training (QAT), significantly reducing memory and computational demands while maintaining high real-time performance on TV devices. Through rigorous testing and on-device end-to-end (E2E) evaluation, we achieved an impressive novel state-of-the-art E2E word recognition accuracy of over 97% across seven languages, with a low latency of under 150 ms per screen. The findings hold great potential for extending this technology to other platforms, including IoT devices and digital appliances.enArtificial intelligencecomputer visiontext spottingoptical character recognitionsubtitleson-devicearticleReal-Time Multilingual Video Subtitle Spotting on Resource-Constrained DevicesArticle