Auto-Generating English Subtitles and Audio for Videos with Azure OpenAI Whisper + Speech Services
I have summarized how to automatically add English subtitles and English audio to Japanese videos. This uses Azure OpenAI Service’s Whisper and Speech Services. Overview The goal this time is to make a Japanese audio video multilingual as follows: Japanese version: Original video (Japanese audio, no subtitles) English version: English audio + English subtitles Services Used Service Purpose Azure OpenAI Service (Whisper) Translation from Japanese audio to English text Azure Speech Services (TTS) Synthesis from English text to English audio FFmpeg Audio extraction and video merging Procedure 1. Environment Setup Required Tools # Install FFmpeg (macOS) brew install ffmpeg # Python libraries pip install python-dotenv requests Azure Configuration (.env) AZURE_OPENAI_ENDPOINT=https://xxxxx.openai.azure.com AZURE_OPENAI_API_KEY=your-api-key AZURE_OPENAI_DEPLOYMENT_NAME=whisper AZURE_OPENAI_API_VERSION=2024-06-01 2. Extract Audio from Video Since the Azure Whisper API has a 25MB file size limit, the audio is compressed and extracted. ...



