MODELS
Whisper Large v3
OpenAI's open-source speech-to-text — multilingual, robust, free to self-host.
Specs
- Modalities
- audio
- Tool use
- —
- Vision
- —
- Streaming
- —
- License
- mit
- Released
- 2023-11-06
Pricing
Whisper Large v3 is OpenAI's flagship open-source speech-to-text model under MIT licence. Trained on 680K hours of audio across 99 languages including strong Mandarin, Cantonese, and Taiwanese-accent coverage. Available as self-hosted weights or via OpenAI API at $0.006/minute. Supports transcription, translation-to-English, and word-level timestamps.
Editor's verdict
The default speech-to-text for almost any builder workflow — multilingual coverage is genuinely better than every commercial alternative, and self-hosting on a single A10G or M2 Mac runs near-realtime. Faster-Whisper / WhisperX wrappers add streaming + speaker diarisation. Weakness: hallucinates on silent or near-silent segments; always run with a VAD (voice activity detector) preprocessor in production.
Reviews
No reviews yet. Be the first.
Last updated: 2026-04-29