What is the problem you’re trying to solve?
We want to enhance user experience of video content via a multilingual offer. We make content accessible and boundary-free for, e.g., those with visual impairments, dislexia, or simply those who enjoy hearing different languages. We will try and do this with the help of text-to-speech synthesis for selected SRG / SSR content during the Hackdays. Support us! Our solution will reduce production costs when creating multilingual video content and at the same time increase the reach of the content.
How do you plan to solve the problem?
SRG / SSR today has a multimedia asset management (MAM) called MediaHub that can automatically generate text from video (STT). The text will automatically be translated into many different languages. These files are available through the API of the SRG Developer Portal and would have to be transferred to a speech service that generates audio files by using an artificial speech output. The Audio files are then transferred back to the original video. In this way, a preferred language can be selected in a video player. We would also like to look at the possibilities of using AI and Deep Learning to train a spoken text in German with a Swiss accent. This will make content more accessible to a broad Swiss target audience.
Official Hackdays page: https://hackdays.sparkboard.com/project/604fe4728a42d9003ce63d5d
Here are some examples, with multi audio tracks. I didn’t find a player on WordPress who is able to show multi audio tracks AND the subtitles. But at the end of the process, we have a multi audio file, with all the subtitles (base for the text to speech). And the player don’t allow to switch the language in all browser (Safari and IE work fine), depending on the configuration. You can also download a mk4 video example, where it also included the subtitles in all languages: link
Here the embedded video example: