AudioText documentation

AudioText is a CPU-first transcription service that other applications can call over HTTP. It is designed for English, Spanish, and Catalan dictation workflows, with an OpenAI-compatible sync endpoint and an async job API for product flows that need queueing.

Use these docs when you need to install the service, connect an application, operate it on a server, or work with the Python package internals.

Start with the shortest path

What AudioText includes

  • FastAPI service with /v1/audio/transcriptions and async job endpoints.
  • Scoped bearer API tokens with hashed storage.
  • Admin login, CSRF protection, and CIDR controls for management routes.
  • SQLite persistence for tokens, jobs, settings, model registry entries, audit events, and usage counters.
  • CLI commands for setup, tokens, models, settings, workers, health checks, and benchmarks.
  • Model runtime caching with unload-all and idle TTL controls.
  • Docker, systemd, and launchd deployment templates.

What is intentionally out of scope

AudioText does not ship first-party client SDK packages. Client apps should use the HTTP API and OpenAPI-compatible examples in these docs. Real streaming, speaker diarization, GPU deployment profiles, and paid-provider passthrough are later extensions, not V1 defaults.