Connect an application to AudioText

Applications should call AudioText from their own backend or API layer. Browser code should not hold AudioText API tokens.

Provider configuration

Applications usually need these settings for an AudioText provider:

Setting Example
Base URL http://127.0.0.1:8791
API token att_tok_...
Default model cpu-lite
Default language auto, ca, es, or en
Preferred mode async for normal product flows, sync for short controlled clips

Store the token as a server-side secret. Treat transcripts and uploaded audio as sensitive data.

Sync request

Use sync requests only for short clips where the application is ready to wait for the response:

POST /v1/audio/transcriptions
Authorization: Bearer <token>
Content-Type: multipart/form-data

Recommended fields:

model=cpu-lite
language=ca
file=@clip.webm

Async request

For normal product flows, create a job:

POST /v1/transcription-jobs
Authorization: Bearer <token>
Content-Type: multipart/form-data

Then poll:

GET /v1/transcription-jobs/{job_id}
Authorization: Bearer <token>

Use the result endpoint when the application only needs the transcript payload:

GET /v1/transcription-jobs/{job_id}/result
Authorization: Bearer <token>

Provider smoke test

Run this against a local AudioText service after creating a token:

AUDIOTEXT_API_TOKEN=att_tok_... \
  uv run python scripts/provider_contract_smoke.py \
  --base-url http://127.0.0.1:8791 \
  --model cpu-lite \
  --language ca \
  --list-only

Drop --list-only only when the real Faster-Whisper backend is installed and you are ready for the selected model to load or download.

Application implementation checklist

  • Browser records audio and uploads it to the application backend, not directly to AudioText.
  • The application backend forwards the audio with a server-side provider token.
  • Provider settings are write-only for secrets.
  • Dictation UI can force ca, es, or en when the page already knows the language.
  • Async jobs are preferred when clips can arrive concurrently.
  • Transcript text is logged only in explicit debugging flows, never by default.