Documentation

AI voice assistant

Configure assistants, voices, prompts, and tools for inbound and outbound voice calls.

The voice assistant is the same ai_assistant record used by chat — with a few extra fields that the voice gateway reads on session.update. Authoring once gives you a text assistant in the widget and a voice assistant on the phone.

Anatomy of an assistant

ai_assistant records live in the AI assistant collection. The voice gateway uses these fields to compose the runtime prompt:

FieldPurpose
title"You are <title>."
descriptionOne sentence about the assistant's role
personalityFree-form tone and style guidance
capabilities[]Playbook IDs — looked up in CAPABILITY_INSTRUCTIONS and inlined
tools[]Function names from CRMToolRegistry enabled for this assistant
behaviorRules[]Numbered rules the model must follow
interactionRules[]{ when, then } pairs for situational behavior
knowledgeSources[]{ sourceType, reference } for query_knowledge tool
safety.restrictedTopics[]Hard "do not discuss" list
safety.escalationContactWhere to escalate if things go wrong
onboardingPromptFirst-turn greeting when no IVR context is present
voiceOpenAI voice id (alloy, ash, ballad, coral, echo, sage, shimmer, verse)
config.recordCallsDefault recording setting when no IVR override
statusMust be active for the gateway to load it

If the assistant is missing or inactive the gateway falls back to a built-in default prompt and the voice from DEFAULT_AI_VOICE.

Voice models

The voice gateway connects to wss://api.openai.com/v1/realtime?model=gpt-realtime and configures the session with audio/pcmu codec on input and output (matches Twilio's μ-law 8 kHz format).

{
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'gpt-realtime',
    output_modalities: ['audio'],
    audio: {
      input:  { format: { type: 'audio/pcmu' }, turn_detection: { type: 'server_vad' } },
      output: { format: { type: 'audio/pcmu' }, voice: VOICE },
    },
    instructions: SYSTEM_MESSAGE,
    tools: [...],
  }
}

server_vad (server-side voice activity detection) means the assistant figures out turn boundaries from audio energy. When the caller starts speaking the gateway forwards an input_audio_buffer.speech_started event, which is also used to send a clear frame to Twilio so the assistant stops talking immediately.

Wiring an assistant to a phone number

Two steps:

  1. 1

    Author the assistant in the AI Assistant module

    CRM > AI Assistants > New. Set voice, tools, capabilities, and safety. Mark status: 'active'.

  2. 2

    Point a phone number's IVR routing at it

    In Phone > Routing for the number, set routingType: 'ai-assistant' and aiAssistantConfig.assistantId to the new assistant. AppEngine generates the TwiML that opens <Stream url="wss://.../simple-voice/stream/{orgId}/{assistantId}"> when calls arrive.

The complete flow lives in phone-numbers.mdx and ivr-flow-builder.mdx.

Outbound calls

Outbound calls hit the same gateway. The make-call tool (in the CRM tool registry) uses TwiML at /connect/webhook/twilio/ivr-ai-outbound to dial a number and connect it to the assistant. Tool implementation: crm/ai-assistant/tools/implementations/make-call.tool.ts.

make_call({
  to: '+15551234567',
  assistantId: '<id>',
  context: { reason: 'follow-up on quote', leadId: '...' },
})

The context is stashed in cache under the same ivr-context:{orgId}:{assistantId}:{caller} key the inbound flow uses, so the outbound assistant gets full routing context too.

Tools available mid-call

The voice gateway reuses CRMToolRegistry.getEnabledTools(assistant.tools). Every tool exposed to the chat assistant works during a call. Common picks:

  • query_knowledge — search org knowledge base
  • lookup_contact / create_lead
  • create_task, create_note
  • transfer_call — built-in, only enabled when IVR context allows transfer

Tool execution is wired through tool.execute(args, { orgId, userId: 'voice-call', assistantId }). The gateway wraps results in a function_call_output and triggers a new response.create so the assistant continues the conversation with the result in hand.

Onboarding prompt vs IVR greeting

There are two greeting sources and they don't combine:

  • onboardingPrompt on the assistant — used when the call comes in directly with no IVR context (e.g., outbound calls without customParameters).
  • ivrContext.greeting — set by the IVR routing when the assistant is reached through an IVR flow. This overrides onboardingPrompt.

The voice gateway appends the chosen greeting to the system prompt with: "Start the conversation by saying: '...'. Then wait for the caller to respond."

The system prompt for a voice call is much longer than for chat — capabilities, behavior rules, knowledge sources, the full IVR routing JSON. Watch your token budget on long calls; tools should fetch on demand instead of pre-loading every record into the prompt.

Per-language behavior

If the IVR routing has menuSettings.language, the gateway appends:

"You must speak in at all times unless the caller explicitly requests a different language. If you detect the caller is struggling with , ask if they would prefer to continue in another language."

Combined with the right voice (e.g., Polly.Lupe-Neural for Spanish on Twilio TTS, or a multilingual OpenAI voice), this is enough to run language-specific assistants.