April 11, 2026

How to Generate Subtitles with Whisper: Audio or Video to SRT, VTT, ASS

Generate subtitles with Whisper from audio or video files. Learn how to create editable SRT, VTT, or ASS subtitles online, choose language settings, fix common errors, and export the right format.

Whisper can turn speech in an audio or video file into timestamped subtitles. This guide shows how to generate SRT, WebVTT (VTT), or ASS subtitles with Pancake Subtitle Tools, review the result, and export a file that works with video editors, players, and publishing platforms.

Use this workflow when you search for things like audio to SRT, video to SRT, Whisper subtitle generator, MP3 to subtitles, or MP4 to subtitles.

What Is Whisper?

Whisper is a speech recognition model from OpenAI. It listens to spoken audio and predicts the words that were said. Unlike a simple plain-text transcript, subtitle tools can use Whisper's timed segments to create subtitle cues with a start time, an end time, and text.

That is why Whisper is useful for subtitles: it does both parts of the job.

It transcribes speech into text.
It gives approximate timing for each segment of speech.

The subtitle tool then formats those segments as SRT, VTT, or ASS so the file can be opened in players, editors, and web video workflows.

Quick Workflow

Open Audio to Subtitle for audio files, or Video to Subtitle for video files.
Upload your file.
Choose the spoken language, or use auto-detect.
Generate subtitles with Whisper.
Review and edit the cues.
Download SRT, VTT, or ASS.

Short files can be tested for free. Longer files require Google sign-in so the tool can process the full transcription job.

Audio to Subtitles

Use Audio to Subtitle when your source file is audio only.

Common examples:

MP3 podcast to SRT
WAV interview to VTT
M4A voice memo to subtitles
FLAC, AAC, OGG, or Opus recordings to transcript subtitles

The audio workflow is usually the best option if your video is very large. Export the audio track from your editor first, then upload the smaller audio file for transcription.

Video to Subtitles

Use Video to Subtitle when you want to upload the video directly.

Common examples:

MP4 to SRT for YouTube
WebM to VTT for an HTML5 video player
MOV or MKV to editable subtitles
Course video to caption file

The tool reads the audio track from the video and sends that speech to Whisper. The visual content does not affect the transcript; audio quality matters much more than video resolution.

Which Subtitle Format Should You Export?

Format	Best for	Notes
SRT	YouTube, VLC, Premiere, DaVinci Resolve, general editing	Most universal subtitle format
VTT	HTML5 video, web players, course platforms	Best for `<video><track>` captions
ASS	Styled subtitles, anime/fansub workflows, Aegisub	Supports richer styling than SRT or VTT

If you are unsure, export SRT first. You can always convert later with SRT to VTT, VTT to SRT, SRT to ASS, or ASS to VTT.

How to Get Better Whisper Subtitles

Use clear audio. Whisper works best when speech is clear, close to the microphone, and not buried under music or background noise.

Choose the language when you know it. Auto-detect is convenient, but selecting the spoken language can help when the clip is short or multilingual.

Keep files within the upload limit. The current maximum upload size is 100 MB per file. For long videos, compress the file, split it into parts, or upload audio only.

Review names and technical terms. Automatic transcription can mishear names, brand terms, numbers, and jargon. Fix those before publishing.

Edit cue length. If a subtitle line is too long, split it. If a sentence is broken awkwardly, merge or adjust cues in the editor.

Troubleshooting

Upload rejected or too large Compress the media, shorten the clip, split the file, or export a lower-bitrate audio version under 100 MB.

Transcript is empty or very short Check that the file has audible speech on the expected audio track. Some videos have silent intros, music-only sections, or missing audio streams.

The language is wrong Run the file again and select the spoken language manually instead of using auto-detect.

The timing is slightly off Automatic timing is approximate. Use the subtitle editor to adjust start and end times before exporting.

I need burned-in subtitles Whisper creates soft subtitle files such as .srt, .vtt, or .ass. To burn subtitles into the video image, import the subtitle file into a video editor or encoder that supports subtitle burn-in.

FAQ

Can Whisper generate SRT files?

Yes. Whisper provides transcribed text with timing, and the subtitle tool formats that output as an SRT file with numbered cues and timestamps.

Can I generate subtitles from a video file?

Yes. Use Video to Subtitle. The tool extracts speech from the video's audio track and returns editable subtitle cues.

Is VTT better than SRT?

Neither is universally better. Use SRT for broad compatibility and editing tools. Use VTT for HTML5 video and web caption tracks.

Can I translate the subtitles after Whisper generates them?

Yes. Download the subtitle file, then open it in the Subtitle Translation Tool to translate the text while preserving timestamps.

Can I edit the generated subtitles?

Yes. After Whisper generates cues, review the preview, correct mistakes, adjust timings, then export the final SRT, VTT, or ASS file.

Summary

Whisper is a strong way to generate subtitles because it transcribes speech and provides timing. Use Audio to Subtitle for MP3, WAV, M4A, and other audio files. Use Video to Subtitle for MP4, WebM, MOV, and MKV files. After generation, edit the cues and download SRT, VTT, or ASS depending on where you plan to publish the subtitles.