OpenClaw Reference (Mirrored)

Deepgram

Mirrored from OpenClaw (MIT)
This mirror is provided for convenience. OpenClawdBots is not affiliated with or endorsed by OpenClaw.

Deepgram

Deepgram is a speech-to-text API. In OpenClaw it is used for inbound audio/voice-note transcription through tools.media.audio and for Voice Call streaming STT through plugins.entries.voice-call.config.streaming.

For batch transcription, OpenClaw uploads the complete audio file to Deepgram and injects the transcript into the reply pipeline ({{Transcript}} + [Audio] block). For Voice Call streaming, OpenClaw forwards live G.711 u-law frames over Deepgram's WebSocket listen endpoint and emits partial or final transcripts as Deepgram returns them.

DetailValue
Websitedeepgram.com
Docsdevelopers.deepgram.com
AuthDEEPGRAM_API_KEY
Default modelnova-3

Getting started

  1. Set your API key

    Add your Deepgram API key to the environment:

    DEEPGRAM_API_KEY=dg_...
    
  2. Enable the audio provider
    {
      tools: {
        media: {
          audio: {
            enabled: true,
            models: [{ provider: "deepgram", model: "nova-3" }],
          },
        },
      },
    }
    
  3. Send a voice note

    Send an audio message through any connected channel. OpenClaw transcribes it via Deepgram and injects the transcript into the reply pipeline.

Configuration options

OptionPathDescription
modeltools.media.audio.models[].modelDeepgram model id (default: nova-3)
languagetools.media.audio.models[].languageLanguage hint (optional)
detect_languagetools.media.audio.providerOptions.deepgram.detect_languageEnable language detection (optional)
punctuatetools.media.audio.providerOptions.deepgram.punctuateEnable punctuation (optional)
smart_formattools.media.audio.providerOptions.deepgram.smart_formatEnable smart formatting (optional)
With language hint
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3", language: "en" }],
      },
    },
  },
}
With Deepgram options
{
  tools: {
    media: {
      audio: {
        enabled: true,
        providerOptions: {
          deepgram: {
            detect_language: true,
            punctuate: true,
            smart_format: true,
          },
        },
        models: [{ provider: "deepgram", model: "nova-3" }],
      },
    },
  },
}

Voice Call streaming STT

The bundled deepgram plugin also registers a realtime transcription provider for the Voice Call plugin.

SettingConfig pathDefault
API keyplugins.entries.voice-call.config.streaming.providers.deepgram.apiKeyFalls back to DEEPGRAM_API_KEY
Model...deepgram.modelnova-3
Language...deepgram.language(unset)
Encoding...deepgram.encodingmulaw
Sample rate...deepgram.sampleRate8000
Endpointing...deepgram.endpointingMs800
Interim results...deepgram.interimResultstrue
{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "deepgram",
            providers: {
              deepgram: {
                apiKey: "${DEEPGRAM_API_KEY}",
                model: "nova-3",
                endpointingMs: 800,
                language: "en-US",
              },
            },
          },
        },
      },
    },
  },
}
NOTE

Voice Call receives telephony audio as 8 kHz G.711 u-law. The Deepgram streaming provider defaults to encoding: "mulaw" and sampleRate: 8000, so Twilio media frames can be forwarded directly.

Notes

Authentication

Authentication follows the standard provider auth order. DEEPGRAM_API_KEY is the simplest path.

Proxy and custom endpoints

Override endpoints or headers with tools.media.audio.baseUrl and tools.media.audio.headers when using a proxy.

Output behavior

Output follows the same audio rules as other providers (size caps, timeouts, transcript injection).