More Accurate Speech Recognition with whisper.cpp

I have been using OpenAI's whisper for a while to convert audio files to text. For example, to generate subtitles for a file, I used

whisper "$INPUT_FILE" -f srt --model turbo --language en

Especially on long files, this would sometimes over time change it's behavior leading to either extremely long or extremely short sentences (run away). Also, whisper took a long time to run.

Luckily, there is whisper-cpp. On my system with an M2 Pro chip, this can now run speech recognition on a 40 minute audio file in a few minutes instead of half an hour.

Also, thanks to a tip from whisper.cpp author Georgi Gerganov, the sentence length can be improved by using a prompt:

--prompt "Hello, this is the first sentence. And this is the second one. A little pause ... and we are back."

and setting

--max-context 64

which should reduce the chance of repetitions and, I suspect, also makes the model less likely to run away.

The total command is,

whisper-cli \
  --model /Users/rik/git/whisper.cpp/models/ggml-large-v3-turbo-q5_0.bin \
  --output-txt \
  --max-context 64 \
  --language "en" \
  --prompt "Hello, this is the first sentence. And this is the second one. A little pause ... and we are back." \
  input.mp3