5 hours ago
last update: 4 hours ago
More Accurate Speech Recognition with whisper.cpp
I have been using OpenAI's whisper for a while to convert audio files to text. For example, to generate subtitles for a file, I used
whisper "$INPUT_FILE" -f srt --model turbo --language en
Especially on long files, this would sometimes over time change it's behavior leading to either extremely long or extremely short sentences (run away).
Also, whisper took a long time to run.
Luckily, there is whisper-cpp. On my system with an M2 Pro chip, this can now run speech recognition on a 40 minute audio file in a few minutes instead of half an hour.
Also, thanks to a tip from whisper.cpp author Georgi Gerganov, the sentence length can be improved by using a prompt:
--prompt "Hello, this is the first sentence. And this is the second one. A little pause ... and we are back."
and setting
--max-context 64
which should reduce the chance of repetitions and, I suspect, also makes the model less likely to run away.
The total command is,
whisper-cli \
--model /Users/rik/git/whisper.cpp/models/ggml-large-v3-turbo-q5_0.bin \
--output-txt \
--max-context 64 \
--language "en" \
--prompt "Hello, this is the first sentence. And this is the second one. A little pause ... and we are back." \
input.mp3