Operating AudioText¶

This guide is for the person running AudioText as a shared transcription backend for one or more applications.

For the shortest clone, .env, start, admin user, and API-token path, use Deploy AudioText with Docker.

Configure the service with environment variables¶

Start from:

cp deploy/audiotext.env.example /etc/audiotext/audiotext.env

Set real secrets before production starts:

AUDIOTEXT_ENV=production
AUDIOTEXT_TOKEN_PEPPER=<64+ random chars>
AUDIOTEXT_ADMIN_SESSION_SECRET=<64+ random chars>

The server reads configuration in this order:

defaults,
TOML config file if AUDIOTEXT_CONFIG or --config is set,
environment variables,
CLI flags such as --host and --port,
runtime settings saved through the admin API where the setting is supported.

SQLite is the V1 database. PostgreSQL should be added later only if real multi-worker pressure or operational evidence shows SQLite is not enough.

Install on a Linux server with Docker¶

Run:

git clone <private-repo-url> audio-to-text
cd audio-to-text
cp deploy/audiotext.env.example .env
docker compose up --build -d

Check:

curl -fsS http://127.0.0.1:8791/healthz
curl -fsS http://127.0.0.1:8791/readyz

The Compose file binds to 127.0.0.1:8791. Put Caddy, nginx, or another reverse proxy in front of it for TLS and public routing.

For an explicit SQLite single-process profile with the API background runner and no separate worker, run:

docker compose --profile sqlite-single up --build audiotext-sqlite-single

That profile binds to 127.0.0.1:8792 by default so it can be tested beside the normal service. Override it with AUDIOTEXT_SQLITE_SINGLE_BIND=127.0.0.1:8791.

Run the repeatable Docker smoke:

uv run python scripts/smoke_docker.py

Install on Linux without Docker¶

Create a service user and directories:

sudo useradd --system --home /var/lib/audiotext --shell /usr/sbin/nologin audiotext
sudo mkdir -p /opt/audiotext /var/lib/audiotext /etc/audiotext
sudo chown -R audiotext:audiotext /var/lib/audiotext

Install the app:

cd /opt/audiotext
git clone <private-repo-url> .
uv sync --extra faster-whisper
sudo cp deploy/audiotext.env.example /etc/audiotext/audiotext.env
sudo cp deploy/systemd/audiotext.service /etc/systemd/system/audiotext.service

Edit /etc/audiotext/audiotext.env, then run:

sudo systemctl daemon-reload
sudo systemctl enable --now audiotext
sudo systemctl status audiotext

Choose async job execution mode¶

For simple installs, keep:

AUDIOTEXT_ASYNC_JOB_RUNNER=background

In this mode the API process records the job and runs it through a FastAPI background task. This is useful locally and for small single-process services.

For production-style split processes, set:

AUDIOTEXT_ASYNC_JOB_RUNNER=external

Then start at least one worker against the same AUDIOTEXT_DATA_DIR and database:

uv run audiotext worker --worker-id worker-1

The queue admission and worker concurrency controls are:

AUDIOTEXT_MAX_RUNNING_JOBS: maximum running DB-claimed jobs across workers.
AUDIOTEXT_MAX_QUEUED_JOBS: maximum queued jobs accepted by the API.
AUDIOTEXT_DEFAULT_MAX_CONCURRENT_ASYNC_JOBS: default queued/running async jobs allowed per newly created API token.

Upload/resource guardrails:

AUDIOTEXT_MAX_AUDIO_CHANNELS
AUDIOTEXT_MAX_SAMPLE_RATE_HZ
AUDIOTEXT_MAX_PROCESS_RSS_BYTES (0 disables the memory guard)
AUDIOTEXT_REQUEST_TIMEOUT_SECONDS for sync transcription responses

With systemd, install both templates:

sudo cp deploy/systemd/audiotext.service /etc/systemd/system/audiotext.service
sudo cp deploy/systemd/audiotext-worker.service /etc/systemd/system/audiotext-worker.service
sudo systemctl daemon-reload
sudo systemctl enable --now audiotext audiotext-worker

With Docker Compose, enable the worker profile and set the API to external queue mode:

AUDIOTEXT_ASYNC_JOB_RUNNER=external docker compose --profile worker up --build -d

Bootstrap the admin account¶

Run once on the server:

set -a
. /etc/audiotext/audiotext.env
set +a
uv run audiotext db migrate
uv run audiotext admin create-user --email admin@example.com

Then open /admin through the route you allow for operators.

Create API tokens¶

For representative dictation:

uv run audiotext token create \
  --name dictation-client-prod \
  --scopes transcriptions:write,transcriptions:read,models:read \
  --max-open-uploads 2 \
  --daily-audio-seconds-quota 7200 \
  --monthly-audio-seconds-quota 120000

Token scopes:

transcriptions:write: submit sync transcriptions and async jobs.
transcriptions:read: read async job status and results for jobs created by the same token.
models:read: list available transcription models.
*: full service token. Use only for internal admin automation.

The raw token is shown only once. Store it in the calling app's server-side secret store. Do not put provider tokens in browser code.

Token policy can also limit allowed models/languages through the admin API, maximum audio/upload size, concurrent async jobs, simultaneous in-flight uploads, and optional daily/monthly audio-second quotas. Quota counters are recorded as usage events and enforced before inference starts.

Output caching is disabled by default because transcripts can contain sensitive data. If you enable AUDIOTEXT_OUTPUT_CACHE_ENABLED=true, each token can still opt out with its allow_output_cache policy. Cache keys use the audio SHA-256 plus model/runtime/options hash; the cache stores transcript output, not the original audio file. Clear it from the admin UI or POST /admin/api/cache/clear.

Run cleanup¶

The API starts a periodic cleanup task when AUDIOTEXT_CLEANUP_INTERVAL_SECONDS is greater than zero. Cleanup removes expired terminal jobs, their uploaded audio files, old orphan upload files, and audit rows older than AUDIOTEXT_AUDIT_RETENTION_DAYS.

Run the same cleanup manually:

uv run audiotext db cleanup

Inspect jobs and audit events¶

The admin UI includes queue/job rows, redacted job detail, audit events, and a metrics summary. The matching admin APIs are:

GET /admin/api/jobs
GET /admin/api/jobs/{job_id}
POST /admin/api/jobs/{job_id}/cancel
GET /admin/api/audit
GET /admin/api/metrics/summary

Job detail redacts transcript content by default. Use the dedicated client result endpoint for normal app flows.

Preload models¶

For hosts where the first request should not pay the model load cost, set:

AUDIOTEXT_PRELOAD_MODELS=cpu-lite,cpu-turbo

If you also set AUDIOTEXT_WARMUP_AUDIO_PATH, startup will run a short warmup transcription against each preloaded model. Keep that file small.

Restrict the admin UI¶

Use both controls:

reverse-proxy routing rules that expose /admin/* only to trusted networks,
AUDIOTEXT_ADMIN_CIDR_ALLOWLIST in the app.

Example:

AUDIOTEXT_ADMIN_CIDR_ALLOWLIST=127.0.0.1/32,10.0.0.0/8

Keep /metrics private behind the same network boundary unless you have a separate monitoring network.

Manage models¶

Built-in V1 presets:

cpu-lite: Faster-Whisper small on CPU int8.
cpu-turbo: OpenAI large-v3-turbo through Faster-Whisper on CPU int8.

List registered models:

uv run audiotext models list --registered

Discover candidates on Hugging Face:

uv run audiotext models discover "whisper spanish catalan" \
  --provider huggingface \
  --language ca \
  --license apache-2.0 \
  --max-size-mb 4000 \
  --backend faster-whisper

Discovery providers are pluggable. Hugging Face has a live search adapter, GitHub/manual artifacts expose review-first candidates, and ModelScope/Kaggle are represented as disabled adapters until credentials or stable APIs are configured. Results include revision, license, size when the provider exposes it, compatibility notes, and warnings.

The admin UI can import curated Faster-Whisper-compatible entries, open model details, trigger download/validate operations, and run an uploaded local benchmark clip. Treat unknown discovered models as untrusted until you verify license, model size, language support, and whether conversion is required. V1 does not enable trust_remote_code.

Upgrade¶

From the server checkout:

git fetch origin
git checkout main
git pull --ff-only
uv sync --extra faster-whisper
uv run audiotext db migrate
sudo systemctl restart audiotext
curl -fsS http://127.0.0.1:8791/readyz

For Docker:

git pull --ff-only
docker compose up --build -d
curl -fsS http://127.0.0.1:8791/readyz

Roll back¶

Keep the previous commit hash before upgrading:

git rev-parse HEAD

If the upgrade fails:

git checkout <previous-commit>
uv sync --extra faster-whisper
sudo systemctl restart audiotext
curl -fsS http://127.0.0.1:8791/readyz

If a migration has already run, restore the SQLite file from backup before starting the older version. Do not run an older binary against a newer schema unless the migration notes explicitly say it is safe.

Troubleshoot common failures¶

ffprobe is not installed:

Install FFmpeg on the host or use the Docker image.

audio duration could not be read:

Check that the uploaded file is real audio and that its extension matches the content type.

First transcription is slow:

The model downloads and loads on first use. Preload a model when predictable latency matters:

uv run audiotext models preload cpu-lite

Memory stays high after unloading:

Python and native libraries may keep arenas mapped. The app releases model references and runs native trim where Linux supports it, but the process may not return to its cold-start RSS. Restart the service when you need the original baseline.

Admin login fails from a remote address:

Check the reverse proxy route and AUDIOTEXT_ADMIN_CIDR_ALLOWLIST.

Run test suites¶

The default suite uses fake backends and does not download models:

uv run pytest -q

Run focused slices:

uv run pytest -q -m unit
uv run pytest -q -m api
uv run pytest -q -m worker
uv run pytest -q -m integration
uv run pytest -q -m "not slow_model"

Manual real Faster-Whisper smoke:

uv sync --extra dev --extra faster-whisper
AUDIOTEXT_RUN_SLOW_MODEL_TESTS=1 uv run pytest -q -m slow_model tests/test_real_faster_whisper_backend.py

Local provider handshake:

AUDIOTEXT_API_TOKEN=att_tok_... \
  uv run python scripts/provider_contract_smoke.py \
  --base-url http://127.0.0.1:8791 \
  --model cpu-lite \
  --language ca \
  --list-only