Home Articles Books Search About
日本語
Creating TEI/XML from VTT Files

Creating TEI/XML from VTT Files

Overview This is a memorandum on how to create TEI/XML files from VTT files. Additionally, I will make it possible to access VTT files and TEI/XML files from an IIIF manifest. As a result, as shown below, the TEI/XML file is associated via SeeAlso, and the contents of the VTT file can be accessed from the “Annotations” tab. https://clover-iiif-demo.vercel.app/?manifest=https://movie-tei-demo.vercel.app/data/sdcommons_npl-02FT0102974177/sdcommons_npl-02FT0102974177_vtt.json References I referenced the following efforts from “The Ethiopian Language Archive.” The TEI/XML structuring method was particularly helpful. ...

Setting Subtitles on Videos Using iiif-prezi3

Setting Subtitles on Videos Using iiif-prezi3

Overview This is a memo on how to set subtitles on videos using iiif-prezi3. Creating Subtitles Subtitle files were created using the OpenAI API. The video file is converted to an audio file. from openai import OpenAI from pydub import AudioSegment from dotenv import load_dotenv class VideoClient: def __init__(self): load_dotenv(verbose=True) api_key = os.getenv("OPENAI_API_KEY") self.client = OpenAI(api_key=api_key) def get_transcriptions(self, input_movie_path): audio = AudioSegment.from_file(input_movie_path) # Write audio to a temporary file with tempfile.NamedTemporaryFile(suffix=".mp3") as temp_audio_file: audio.export(temp_audio_file.name, format="mp3") # Export in MP3 format temp_audio_file.seek(0) # Reset file pointer to the beginning # Get transcript with Whisper API with open(temp_audio_file.name, "rb") as audio_file: # Get transcript with Whisper API transcript = self.client.audio.transcriptions.create( model="whisper-1", file=audio_file, response_format="vtt" ) return transcript Data Used “Kensei News Volume 1” (Nagano Prefectural Library) is used. ...