OpenClaw Reference (Mirrored)

Azure Speech

Mirrored from OpenClaw (MIT)
This mirror is provided for convenience. OpenClawdBots is not affiliated with or endorsed by OpenClaw.

Azure Speech

Azure Speech is an Azure AI Speech text-to-speech provider. In OpenClaw it synthesizes outbound reply audio as MP3 by default, native Ogg/Opus for voice notes, and 8 kHz mulaw audio for telephony channels such as Voice Call.

OpenClaw uses the Azure Speech REST API directly with SSML and sends the provider-owned output format through X-Microsoft-OutputFormat.

DetailValue
WebsiteAzure AI Speech
DocsSpeech REST text-to-speech
AuthAZURE_SPEECH_KEY plus AZURE_SPEECH_REGION
Default voiceen-US-JennyNeural
Default file outputaudio-24khz-48kbitrate-mono-mp3
Default voice-note fileogg-24khz-16bit-mono-opus

Getting started

  1. Create an Azure Speech resource

    In the Azure portal, create a Speech resource. Copy KEY 1 from Resource Management > Keys and Endpoint, and copy the resource location such as eastus.

    AZURE_SPEECH_KEY=<speech-resource-key>
    AZURE_SPEECH_REGION=eastus
    
  2. Select Azure Speech in messages.tts
    {
      messages: {
        tts: {
          auto: "always",
          provider: "azure-speech",
          providers: {
            "azure-speech": {
              voice: "en-US-JennyNeural",
              lang: "en-US",
            },
          },
        },
      },
    }
    
  3. Send a message

    Send a reply through any connected channel. OpenClaw synthesizes the audio with Azure Speech and delivers MP3 for standard audio, or Ogg/Opus when the channel expects a voice note.

Configuration options

OptionPathDescription
apiKeymessages.tts.providers.azure-speech.apiKeyAzure Speech resource key. Falls back to AZURE_SPEECH_KEY, AZURE_SPEECH_API_KEY, or SPEECH_KEY.
regionmessages.tts.providers.azure-speech.regionAzure Speech resource region. Falls back to AZURE_SPEECH_REGION or SPEECH_REGION.
endpointmessages.tts.providers.azure-speech.endpointOptional Azure Speech endpoint/base URL override.
baseUrlmessages.tts.providers.azure-speech.baseUrlOptional Azure Speech base URL override.
voicemessages.tts.providers.azure-speech.voiceAzure voice ShortName (default en-US-JennyNeural).
langmessages.tts.providers.azure-speech.langSSML language code (default en-US).
outputFormatmessages.tts.providers.azure-speech.outputFormatAudio-file output format (default audio-24khz-48kbitrate-mono-mp3).
voiceNoteOutputFormatmessages.tts.providers.azure-speech.voiceNoteOutputFormatVoice-note output format (default ogg-24khz-16bit-mono-opus).

Notes

Authentication

Azure Speech uses a Speech resource key, not an Azure OpenAI key. The key is sent as Ocp-Apim-Subscription-Key; OpenClaw derives https://<region>.tts.speech.microsoft.com from region unless you provide endpoint or baseUrl.

Voice names

Use the Azure Speech voice ShortName value, for example en-US-JennyNeural. The bundled provider can list voices through the same Speech resource and filters voices marked deprecated or retired.

Audio outputs

Azure accepts output formats such as audio-24khz-48kbitrate-mono-mp3, ogg-24khz-16bit-mono-opus, and riff-24khz-16bit-mono-pcm. OpenClaw requests Ogg/Opus for voice-note targets so channels can send native voice bubbles without an extra MP3 conversion.

Alias

azure is accepted as a provider alias for existing PRs and user config, but new config should use azure-speech to avoid confusion with Azure OpenAI model providers.