Server Benchmark Results¶
These results were captured on 2026-05-15 against the SSH target server from a
fresh checkout of main.
Host snapshot:
- OS: Ubuntu Linux, kernel
6.8.0-90-generic - CPU: 4 cores
- Memory: 7.6 GiB total, about 3.5 GiB available before the smoke
- Runtime: Docker image built from the repository
Dockerfile
The benchmark used two generated 0.75 second mono WAV clips, beam_size=1,
forced language=ca, cpu_threads=4, num_workers=1, and vad_filter=false.
The second run set AUDIOTEXT_MAX_LOADED_MODELS=2 so both models stayed loaded
between clips. That makes the second clip for each model the useful
already-downloaded, already-loaded comparison.
Command shape:
docker run --rm \
-e AUDIOTEXT_DATA_DIR=/data \
-e AUDIOTEXT_TOKEN_PEPPER=pepper \
-e AUDIOTEXT_ADMIN_SESSION_SECRET=session \
-e AUDIOTEXT_MAX_LOADED_MODELS=2 \
-e HF_HOME=/hf-cache \
-v "$PWD/data:/data" \
-v "$PWD/bench:/bench:ro" \
-v "$PWD/hf-cache:/hf-cache" \
audiotext:v1-smoke \
audiotext benchmark run \
--audio /bench/clip-a.wav \
--audio /bench/clip-b.wav \
--models cpu-lite,cpu-turbo \
--languages ca \
--beam-sizes 1 \
--cpu-threads 4 \
--num-workers 1 \
--vad-modes false \
--output markdown
| Audio | Model | Lang | Threads | Workers | Beam | VAD | Wall s | CPU s | CPU % | Load s | Transcribe s | Cache | Peak RSS MB | Preview |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
/bench/clip-a.wav |
cpu-lite |
ca |
4 | 1 | 1 | no | 8.465 | 16.957 | 200.3 | 3.967 | 4.498 | no | 680.0 | |
/bench/clip-a.wav |
cpu-turbo |
ca |
4 | 1 | 1 | no | 24.608 | 66.707 | 271.1 | 5.882 | 18.726 | no | 2425.4 | Fins dema! |
/bench/clip-b.wav |
cpu-lite |
ca |
4 | 1 | 1 | no | 3.887 | 11.762 | 302.6 | 0.000 | 3.887 | yes | 2425.4 | |
/bench/clip-b.wav |
cpu-turbo |
ca |
4 | 1 | 1 | no | 17.757 | 57.691 | 324.9 | 0.000 | 17.757 | yes | 2425.5 | Gracies. |
Notes:
cpu-turboused substantially more memory and was much slower thancpu-liteon this CPU-only VPS.- The clips are synthetic tones, so the transcript previews are not quality signals. Use private representative clips for model quality decisions.
- The temporary checkout, containers, image, and Hugging Face cache created for this smoke were removed after the benchmark.