Operating AudioText¶
This guide is for the person running AudioText as a shared transcription backend for one or more applications.
For the shortest clone, .env, start, admin user, and API-token path, use
Deploy AudioText with Docker.
Configure the service with environment variables¶
Start from:
Set real secrets before production starts:
AUDIOTEXT_ENV=production
AUDIOTEXT_TOKEN_PEPPER=<64+ random chars>
AUDIOTEXT_ADMIN_SESSION_SECRET=<64+ random chars>
The server reads configuration in this order:
- defaults,
- TOML config file if
AUDIOTEXT_CONFIGor--configis set, - environment variables,
- CLI flags such as
--hostand--port, - runtime settings saved through the admin API where the setting is supported.
SQLite is the V1 database. PostgreSQL should be added later only if real multi-worker pressure or operational evidence shows SQLite is not enough.
Install on a Linux server with Docker¶
Run:
git clone <private-repo-url> audio-to-text
cd audio-to-text
cp deploy/audiotext.env.example .env
docker compose up --build -d
Check:
The Compose file binds to 127.0.0.1:8791. Put Caddy, nginx, or another
reverse proxy in front of it for TLS and public routing.
For an explicit SQLite single-process profile with the API background runner and no separate worker, run:
That profile binds to 127.0.0.1:8792 by default so it can be tested beside the
normal service. Override it with AUDIOTEXT_SQLITE_SINGLE_BIND=127.0.0.1:8791.
Run the repeatable Docker smoke:
Install on Linux without Docker¶
Create a service user and directories:
sudo useradd --system --home /var/lib/audiotext --shell /usr/sbin/nologin audiotext
sudo mkdir -p /opt/audiotext /var/lib/audiotext /etc/audiotext
sudo chown -R audiotext:audiotext /var/lib/audiotext
Install the app:
cd /opt/audiotext
git clone <private-repo-url> .
uv sync --extra faster-whisper
sudo cp deploy/audiotext.env.example /etc/audiotext/audiotext.env
sudo cp deploy/systemd/audiotext.service /etc/systemd/system/audiotext.service
Edit /etc/audiotext/audiotext.env, then run:
Choose async job execution mode¶
For simple installs, keep:
In this mode the API process records the job and runs it through a FastAPI background task. This is useful locally and for small single-process services.
For production-style split processes, set:
Then start at least one worker against the same AUDIOTEXT_DATA_DIR and
database:
The queue admission and worker concurrency controls are:
AUDIOTEXT_MAX_RUNNING_JOBS: maximum running DB-claimed jobs across workers.AUDIOTEXT_MAX_QUEUED_JOBS: maximum queued jobs accepted by the API.AUDIOTEXT_DEFAULT_MAX_CONCURRENT_ASYNC_JOBS: default queued/running async jobs allowed per newly created API token.
Upload/resource guardrails:
AUDIOTEXT_MAX_AUDIO_CHANNELSAUDIOTEXT_MAX_SAMPLE_RATE_HZAUDIOTEXT_MAX_PROCESS_RSS_BYTES(0disables the memory guard)AUDIOTEXT_REQUEST_TIMEOUT_SECONDSfor sync transcription responses
With systemd, install both templates:
sudo cp deploy/systemd/audiotext.service /etc/systemd/system/audiotext.service
sudo cp deploy/systemd/audiotext-worker.service /etc/systemd/system/audiotext-worker.service
sudo systemctl daemon-reload
sudo systemctl enable --now audiotext audiotext-worker
With Docker Compose, enable the worker profile and set the API to external queue mode:
Bootstrap the admin account¶
Run once on the server:
set -a
. /etc/audiotext/audiotext.env
set +a
uv run audiotext db migrate
uv run audiotext admin create-user --email admin@example.com
Then open /admin through the route you allow for operators.
Create API tokens¶
For representative dictation:
uv run audiotext token create \
--name dictation-client-prod \
--scopes transcriptions:write,transcriptions:read,models:read \
--max-open-uploads 2 \
--daily-audio-seconds-quota 7200 \
--monthly-audio-seconds-quota 120000
Token scopes:
transcriptions:write: submit sync transcriptions and async jobs.transcriptions:read: read async job status and results for jobs created by the same token.models:read: list available transcription models.*: full service token. Use only for internal admin automation.
The raw token is shown only once. Store it in the calling app's server-side secret store. Do not put provider tokens in browser code.
Token policy can also limit allowed models/languages through the admin API, maximum audio/upload size, concurrent async jobs, simultaneous in-flight uploads, and optional daily/monthly audio-second quotas. Quota counters are recorded as usage events and enforced before inference starts.
Output caching is disabled by default because transcripts can contain sensitive
data. If you enable AUDIOTEXT_OUTPUT_CACHE_ENABLED=true, each token can still
opt out with its allow_output_cache policy. Cache keys use the audio SHA-256
plus model/runtime/options hash; the cache stores transcript output, not the
original audio file. Clear it from the admin UI or POST /admin/api/cache/clear.
Run cleanup¶
The API starts a periodic cleanup task when AUDIOTEXT_CLEANUP_INTERVAL_SECONDS
is greater than zero. Cleanup removes expired terminal jobs, their uploaded
audio files, old orphan upload files, and audit rows older than
AUDIOTEXT_AUDIT_RETENTION_DAYS.
Run the same cleanup manually:
Inspect jobs and audit events¶
The admin UI includes queue/job rows, redacted job detail, audit events, and a metrics summary. The matching admin APIs are:
GET /admin/api/jobsGET /admin/api/jobs/{job_id}POST /admin/api/jobs/{job_id}/cancelGET /admin/api/auditGET /admin/api/metrics/summary
Job detail redacts transcript content by default. Use the dedicated client result endpoint for normal app flows.
Preload models¶
For hosts where the first request should not pay the model load cost, set:
If you also set AUDIOTEXT_WARMUP_AUDIO_PATH, startup will run a short warmup
transcription against each preloaded model. Keep that file small.
Restrict the admin UI¶
Use both controls:
- reverse-proxy routing rules that expose
/admin/*only to trusted networks, AUDIOTEXT_ADMIN_CIDR_ALLOWLISTin the app.
Example:
Keep /metrics private behind the same network boundary unless you have a
separate monitoring network.
Manage models¶
Built-in V1 presets:
cpu-lite: Faster-Whisper small on CPU int8.cpu-turbo: OpenAI large-v3-turbo through Faster-Whisper on CPU int8.
List registered models:
Discover candidates on Hugging Face:
uv run audiotext models discover "whisper spanish catalan" \
--provider huggingface \
--language ca \
--license apache-2.0 \
--max-size-mb 4000 \
--backend faster-whisper
Discovery providers are pluggable. Hugging Face has a live search adapter, GitHub/manual artifacts expose review-first candidates, and ModelScope/Kaggle are represented as disabled adapters until credentials or stable APIs are configured. Results include revision, license, size when the provider exposes it, compatibility notes, and warnings.
The admin UI can import curated Faster-Whisper-compatible entries, open model
details, trigger download/validate operations, and run an uploaded local
benchmark clip. Treat unknown discovered models as untrusted until you verify
license, model size, language support, and whether conversion is required. V1
does not enable trust_remote_code.
Upgrade¶
From the server checkout:
git fetch origin
git checkout main
git pull --ff-only
uv sync --extra faster-whisper
uv run audiotext db migrate
sudo systemctl restart audiotext
curl -fsS http://127.0.0.1:8791/readyz
For Docker:
Roll back¶
Keep the previous commit hash before upgrading:
If the upgrade fails:
git checkout <previous-commit>
uv sync --extra faster-whisper
sudo systemctl restart audiotext
curl -fsS http://127.0.0.1:8791/readyz
If a migration has already run, restore the SQLite file from backup before starting the older version. Do not run an older binary against a newer schema unless the migration notes explicitly say it is safe.
Troubleshoot common failures¶
ffprobe is not installed:
Install FFmpeg on the host or use the Docker image.
audio duration could not be read:
Check that the uploaded file is real audio and that its extension matches the content type.
First transcription is slow:
The model downloads and loads on first use. Preload a model when predictable latency matters:
Memory stays high after unloading:
Python and native libraries may keep arenas mapped. The app releases model references and runs native trim where Linux supports it, but the process may not return to its cold-start RSS. Restart the service when you need the original baseline.
Admin login fails from a remote address:
Check the reverse proxy route and AUDIOTEXT_ADMIN_CIDR_ALLOWLIST.
Run test suites¶
The default suite uses fake backends and does not download models:
Run focused slices:
uv run pytest -q -m unit
uv run pytest -q -m api
uv run pytest -q -m worker
uv run pytest -q -m integration
uv run pytest -q -m "not slow_model"
Manual real Faster-Whisper smoke:
uv sync --extra dev --extra faster-whisper
AUDIOTEXT_RUN_SLOW_MODEL_TESTS=1 uv run pytest -q -m slow_model tests/test_real_faster_whisper_backend.py
Local provider handshake: