MODELS

Whisper Large v3

Name: Whisper Large v3
Brand: openai

OpenAI's open-source speech-to-text — multilingual, robust, free to self-host.

openaiwhisperopen source

Go to official site →API docs →

Specs

Modalities: audio
Tool use: —
Vision: —
Streaming: —
License: mit
Released: 2023-11-06

Pricing

Whisper Large v3 is OpenAI's flagship open-source speech-to-text model under MIT licence. Trained on 680K hours of audio across 99 languages including strong Mandarin, Cantonese, and Taiwanese-accent coverage. Available as self-hosted weights or via OpenAI API at $0.006/minute. Supports transcription, translation-to-English, and word-level timestamps.

Editor's verdict

The default speech-to-text for almost any builder workflow — multilingual coverage is genuinely better than every commercial alternative, and self-hosting on a single A10G or M2 Mac runs near-realtime. Faster-Whisper / WhisperX wrappers add streaming + speaker diarisation. Weakness: hallucinates on silent or near-silent segments; always run with a VAD (voice activity detector) preprocessor in production.

Reviews

No reviews yet. Be the first.

Last updated: 2026-04-29