Deepgram (Audio Transcription)
Deepgram is a speech-to-text API. In OpenClaw it is used for inbound audio/voice note
transcription via tools.media.audio.
When enabled, OpenClaw uploads the audio file to Deepgram and injects the transcript
into the reply pipeline ({{Transcript}} + [Audio] block). This is not streaming;
it uses the pre-recorded transcription endpoint.
| Detail | Value |
|---|---|
| Website | deepgram.com |
| Docs | developers.deepgram.com |
| Auth | DEEPGRAM_API_KEY |
| Default model | nova-3 |
Getting started
- Set your API key
Add your Deepgram API key to the environment:
DEEPGRAM_API_KEY=dg_... - Enable the audio provider
{ tools: { media: { audio: { enabled: true, models: [{ provider: "deepgram", model: "nova-3" }], }, }, }, } - Send a voice note
Send an audio message through any connected channel. OpenClaw transcribes it via Deepgram and injects the transcript into the reply pipeline.
Configuration options
| Option | Path | Description |
|---|---|---|
model | tools.media.audio.models[].model | Deepgram model id (default: nova-3) |
language | tools.media.audio.models[].language | Language hint (optional) |
detect_language | tools.media.audio.providerOptions.deepgram.detect_language | Enable language detection (optional) |
punctuate | tools.media.audio.providerOptions.deepgram.punctuate | Enable punctuation (optional) |
smart_format | tools.media.audio.providerOptions.deepgram.smart_format | Enable smart formatting (optional) |
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "deepgram", model: "nova-3", language: "en" }],
},
},
},
}
{
tools: {
media: {
audio: {
enabled: true,
providerOptions: {
deepgram: {
detect_language: true,
punctuate: true,
smart_format: true,
},
},
models: [{ provider: "deepgram", model: "nova-3" }],
},
},
},
}
Notes
Authentication
Authentication follows the standard provider auth order. DEEPGRAM_API_KEY is
the simplest path.
Proxy and custom endpoints
Override endpoints or headers with tools.media.audio.baseUrl and
tools.media.audio.headers when using a proxy.
Output behavior
Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).
Deepgram transcription is pre-recorded only (not real-time streaming). OpenClaw uploads the complete audio file and waits for the full transcript before injecting it into the conversation.