Server Benchmark Results

These results were captured on 2026-05-15 against the SSH target server from a fresh checkout of main.

Host snapshot:

  • OS: Ubuntu Linux, kernel 6.8.0-90-generic
  • CPU: 4 cores
  • Memory: 7.6 GiB total, about 3.5 GiB available before the smoke
  • Runtime: Docker image built from the repository Dockerfile

The benchmark used two generated 0.75 second mono WAV clips, beam_size=1, forced language=ca, cpu_threads=4, num_workers=1, and vad_filter=false. The second run set AUDIOTEXT_MAX_LOADED_MODELS=2 so both models stayed loaded between clips. That makes the second clip for each model the useful already-downloaded, already-loaded comparison.

Command shape:

docker run --rm \
  -e AUDIOTEXT_DATA_DIR=/data \
  -e AUDIOTEXT_TOKEN_PEPPER=pepper \
  -e AUDIOTEXT_ADMIN_SESSION_SECRET=session \
  -e AUDIOTEXT_MAX_LOADED_MODELS=2 \
  -e HF_HOME=/hf-cache \
  -v "$PWD/data:/data" \
  -v "$PWD/bench:/bench:ro" \
  -v "$PWD/hf-cache:/hf-cache" \
  audiotext:v1-smoke \
  audiotext benchmark run \
    --audio /bench/clip-a.wav \
    --audio /bench/clip-b.wav \
    --models cpu-lite,cpu-turbo \
    --languages ca \
    --beam-sizes 1 \
    --cpu-threads 4 \
    --num-workers 1 \
    --vad-modes false \
    --output markdown
Audio Model Lang Threads Workers Beam VAD Wall s CPU s CPU % Load s Transcribe s Cache Peak RSS MB Preview
/bench/clip-a.wav cpu-lite ca 4 1 1 no 8.465 16.957 200.3 3.967 4.498 no 680.0
/bench/clip-a.wav cpu-turbo ca 4 1 1 no 24.608 66.707 271.1 5.882 18.726 no 2425.4 Fins dema!
/bench/clip-b.wav cpu-lite ca 4 1 1 no 3.887 11.762 302.6 0.000 3.887 yes 2425.4
/bench/clip-b.wav cpu-turbo ca 4 1 1 no 17.757 57.691 324.9 0.000 17.757 yes 2425.5 Gracies.

Notes:

  • cpu-turbo used substantially more memory and was much slower than cpu-lite on this CPU-only VPS.
  • The clips are synthetic tones, so the transcript previews are not quality signals. Use private representative clips for model quality decisions.
  • The temporary checkout, containers, image, and Hugging Face cache created for this smoke were removed after the benchmark.