In a recent post, I showed how CeTZ-Plot can be used to plot data from a CSV file. I posted this on Reddit and got some interesting comments. One comment was that CeTZ-Plot was too slow for plotting data with 90k rows to SVG. This could be due to SVG being a vector format, so it will always add all 90k points even if they are on top of each other. It's probably a better idea to plot PNG in such cases.
But let's still see how fast CeTZ-Plot is. This is actually an interesting question in general because CeTZ-Plot is written in Typst. Typst is a new typesetting system similar to LaTeX. Writing in this system is probably slower than writing in a more optimized language. But on the other hand, Typst was written in Rust so maybe the performance is not too bad.
Only one way to find out: let's benchmark it! Here I keep the benchmark very simple. In general, I'm just interested in how fast the library can plot a simple scatter plot. For the benchmark, I will benchmark how long it takes on the command line to generate the PNG file. In the case of CeTZ-Plot, and matplotlib, this includes the time it takes to load the packages. I've tried to benchmark fastplotlib too, but according to the FAQ "fastplotlib is not intended for creating _static publication figures._" Also, the image quality is set to 300 DPI and a resolution of 1000x1000 pixels. And the ticks and limits are hardcoded to be the same for all plots. Warmup runs are allowed so the packages should be installed already during the benchmark. For the benchmark, I'll use hyperfine. Also, the benchmark runs on a MacBook with an M2 Pro chip. I expect that CeTZ-Plot will be a lot slower than gnuplot, matplotlib, and fastplotlib, but let's see.
To generate the data, I used the following Python script:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "numpy",
# ]
# ///
import numpy as np
import csv
import time
np.random.seed(42)
sizes = [100, 1_000, 10_000, 100_000, 1_000_000]
for size in sizes:
with open(f"data_{size}.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["x", "y"])
x = np.random.uniform(low=0.0, high=1.0, size=size)
y = np.random.uniform(low=0.0, high=1.0, size=size)
for i in range(size):
writer.writerow([x[i], y[i]])
print(f"Generated dataset with {size} points")
To plot the data in matplotlib, I used the following script:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = "==3.13.2"
# dependencies = [
# "matplotlib==3.10",
# ]
# ///
import matplotlib.pyplot as plt
import sys
import csv
file = sys.argv[1]
data = csv.reader(open(file))
x, y = zip(*data)
plt.scatter(x, y)
plt.savefig("matplotlib.png")
As a test, this is how the plot looks:
$ ./matplot.py data_100.csv
For CeTZ-Plot, I used the following script:
#import "@preview/cetz:0.3.2": canvas, draw
#import "@preview/cetz-plot:0.1.1": plot
#set page(width: 3.333in, height: 3.333in, margin: 0in)
#let file = sys.inputs.file
#let data = csv(file, row-type: dictionary)
#let points = data.map(row => (float(row.x), float(row.y)))
#align(center + horizon)[
#canvas({
import draw: *
plot.plot(
// The size of the plot. The page is set to auto so it will automatically
// scale the page to fit the plot.
size: (6.5, 6.5),
x-label: none,
y-label: none,
x-min: -0.2,
x-max: 1.2,
y-min: -0.2,
y-max: 1.2,
x-tick-step: 1,
y-tick-step: 1,
{
plot.add(
points,
style: (stroke: none),
mark: "o",
)
}
)
})
]
and ran it with Typst 0.13.1. Which for 100 points returns the following image:
$ typst compile --ppi=300 --format=png --input file=data_100.csv plot.typ cetz.png
set terminal pngcairo size 1000,1000 enhanced font ',10'
set output 'gnuplot.png'
set datafile separator comma
set xtics (0, 1)
set ytics (0, 1)
set xrange [-0.2:1.2]
set yrange [-0.2:1.2]
unset key
plot ARG1 every ::1 using 1:2 with points pt 7 ps 1
Which for 100 points looks like this:
$ gnuplot -c gnuplot.gp data_100.csv
Benchmarks were executed with the following command:
$ hyperfine --warmup=3 --runs=5 "<command>"
The results for the mean ± σ are shown in the table below:
100 | 1,000 | 10,000 | 100,000 | 1,000,000 |
---|---|---|---|---|
matplotlib | 504.6 ms ± 6.7 ms | 511.3 ms ± 10.4 ms | 517.4 ms ± 4.5 ms | 674.3 ms ± 13.4 ms |
CeTZ-Plot | 500 ms ± 4.8 ms | 3.892 s ± 0.029 s | 39.459 s ± 0.420 s | |
gnuplot | 120.6 ms ± 2.3 ms | 130.4 ms ± 1.5 ms | 204.8 ms ± 1.9 ms | 755.5 ms ± 19.1 ms |
When plotted, the performance of the different libraries looks like this:
As you can see, CeTZ-Plot starts to take much longer above 1,000 points. With 40 seconds, 10,000 points is still a possibility, but it starts to become unwieldy. At 10,000 points, gnuplot and matplotlib can still plot the data in less than a second. Only at 1,000,000 points does gnuplot start to take longer. Matplotlib appears to be the clear winner here. The performance barely suffers from more points even when going up to one million.