Skip to content
Whisper Large v3 logo

MODELS

Whisper Large v3

OpenAI's open-source speech-to-text — multilingual, robust, free to self-host.

openaiwhisperopen source

Specs

Modalities
audio
Tool use
Vision
Streaming
License
mit
Released
2023-11-06

Pricing

Whisper Large v3 is OpenAI's flagship open-source speech-to-text model under MIT licence. Trained on 680K hours of audio across 99 languages including strong Mandarin, Cantonese, and Taiwanese-accent coverage. Available as self-hosted weights or via OpenAI API at $0.006/minute. Supports transcription, translation-to-English, and word-level timestamps.

Editor's verdict

The default speech-to-text for almost any builder workflow — multilingual coverage is genuinely better than every commercial alternative, and self-hosting on a single A10G or M2 Mac runs near-realtime. Faster-Whisper / WhisperX wrappers add streaming + speaker diarisation. Weakness: hallucinates on silent or near-silent segments; always run with a VAD (voice activity detector) preprocessor in production.

Reviews

No reviews yet. Be the first.

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more