Converting a PDF to text locally with Ollama

The following is an AI-based method of converting a PDF to text. There is OCR software that can more or less detect symbols without AI, but I find that AI is typically more accurate and introduces fewer strange symbols.

To do this, first convert the PDF to images via

pdftoppm source.pdf source -png

This will produce a number of PNG images. In my case, the numbers went from 001 to 304.

Next, I used the following shell script to run AI on each image. Here, I'm using gemma3:4b since it is a model with image input functionality and runs reasonably fast on my system.

#!/usr/bin/env bash

set -euxo pipefail

OUTFILE="output.txt"

echo "" > "$OUTFILE"

for i in {1..304}; do
  THREE_DIGIT=$(printf '%03d' "$i")
  FILE="/Users/rik/Downloads/books/example/source-$THREE_DIGIT.png"
  OUT=$(ollama run gemma3:4b "Give the text from the image $FILE")
  echo "$OUT" >> "$OUTFILE"
  echo "" >> "$OUTFILE"
done