Installing and running Spleeter to separate audio files

Here are up-to-date installation instructions for running Deezer's Spleeter on Ubuntu 24.04. Minimum requirements are around 16 GB of RAM. (During the processing, it uses around 11 GB at the peak.) I ran this on a temporary Hetzner server because my Apple Silicon system, after lots of fiddling with version, ran into AVX issues.

Install Conda.

conda create -n spleeter_env python=3.8 -y
conda activate spleeter_env
conda install -c conda-forge ffmpeg libsndfile numpy=1.19 -y
pip install spleeter
spleeter separate -o audio_output input.mp3

If your audio file is too long, Spleeter will truncate it. As a workaround, you can first split your files, then process, and then merge again via the following script.

#!/usr/bin/env bash

set -euxo pipefail

INPUT_FILE="myfile.m4a"
CHUNK_DURATION=300 # 5 minutes in seconds
OUTPUT_DIR="spleeter_chunks"
FINAL_OUTPUT="final_vocals.wav"
STEM="vocals" # Change to 'accompaniment' to get the background sound.

mkdir -p "$OUTPUT_DIR"
mkdir -p "processed_chunks"

# 1. Split the file into 5-minute chunks
echo "--- Splitting input file ---"
ffmpeg -i "$INPUT_FILE" -f segment \
    -segment_time $CHUNK_DURATION \
    -c copy "$OUTPUT_DIR/chunk%03d.m4a"

# 2. Process each chunk with Spleeter.
for chunk in "$OUTPUT_DIR"/*.m4a; do
    echo "--- Processing $chunk ---"
    spleeter separate -o "processed_chunks" "$chunk"
done

# 3. Create a list of the separated stems for merging.
rm -f concat_list.txt
for dir in processed_chunks/chunk*; do
    echo "file '$dir/$STEM.wav'" >> concat_list.txt
done

# 4. Merge the chunks back together.
ffmpeg -f concat -safe 0 -i concat_list.txt -c copy "$FINAL_OUTPUT"

echo "Done! Your merged $STEM track is at $FINAL_OUTPUT"