2 hours ago
last update: 2 hours ago
Converting a PDF to text locally with Ollama
The following is an AI-based method of converting a PDF to text. There is OCR software that can more or less detect symbols without AI, but I find that AI is typically more accurate and introduces fewer strange symbols.
To do this, first convert the PDF to images via
pdftoppm source.pdf source -png
This will produce a number of PNG images. In my case, the numbers went from 001 to 304.
Next, I used the following shell script to run AI on each image. Here, I'm using gemma3:4b since it is a model with image input functionality and runs reasonably fast on my system.
#!/usr/bin/env bash
set -euxo pipefail
OUTFILE="output.txt"
echo "" > "$OUTFILE"
for i in {1..304}; do
THREE_DIGIT=$(printf '%03d' "$i")
FILE="/Users/rik/Downloads/books/example/source-$THREE_DIGIT.png"
OUT=$(ollama run gemma3:4b "Give the text from the image $FILE")
echo "$OUT" >> "$OUTFILE"
echo "" >> "$OUTFILE"
done