The voice assistant is the same ai_assistant record used by chat — with a few extra fields that the voice gateway reads on session.update. Authoring once gives you a text assistant in the widget and a voice assistant on the phone.
Anatomy of an assistant
ai_assistant records live in the AI assistant collection. The voice gateway uses these fields to compose the runtime prompt:
| Field | Purpose |
|---|---|
title | "You are <title>." |
description | One sentence about the assistant's role |
personality | Free-form tone and style guidance |
capabilities[] | Playbook IDs — looked up in CAPABILITY_INSTRUCTIONS and inlined |
tools[] | Function names from CRMToolRegistry enabled for this assistant |
behaviorRules[] | Numbered rules the model must follow |
interactionRules[] | { when, then } pairs for situational behavior |
knowledgeSources[] | { sourceType, reference } for query_knowledge tool |
safety.restrictedTopics[] | Hard "do not discuss" list |
safety.escalationContact | Where to escalate if things go wrong |
onboardingPrompt | First-turn greeting when no IVR context is present |
voice | OpenAI voice id (alloy, ash, ballad, coral, echo, sage, shimmer, verse) |
config.recordCalls | Default recording setting when no IVR override |
status | Must be active for the gateway to load it |
If the assistant is missing or inactive the gateway falls back to a built-in default prompt and the voice from DEFAULT_AI_VOICE.
Voice models
The voice gateway connects to wss://api.openai.com/v1/realtime?model=gpt-realtime and configures the session with audio/pcmu codec on input and output (matches Twilio's μ-law 8 kHz format).
{
type: 'session.update',
session: {
type: 'realtime',
model: 'gpt-realtime',
output_modalities: ['audio'],
audio: {
input: { format: { type: 'audio/pcmu' }, turn_detection: { type: 'server_vad' } },
output: { format: { type: 'audio/pcmu' }, voice: VOICE },
},
instructions: SYSTEM_MESSAGE,
tools: [...],
}
}
server_vad (server-side voice activity detection) means the assistant figures out turn boundaries from audio energy. When the caller starts speaking the gateway forwards an input_audio_buffer.speech_started event, which is also used to send a clear frame to Twilio so the assistant stops talking immediately.
Wiring an assistant to a phone number
Two steps:
- 1
Author the assistant in the AI Assistant module
CRM > AI Assistants > New. Set
voice,tools,capabilities, andsafety. Markstatus: 'active'. - 2
Point a phone number's IVR routing at it
In Phone > Routing for the number, set
routingType: 'ai-assistant'andaiAssistantConfig.assistantIdto the new assistant. AppEngine generates the TwiML that opens<Stream url="wss://.../simple-voice/stream/{orgId}/{assistantId}">when calls arrive.
The complete flow lives in phone-numbers.mdx and ivr-flow-builder.mdx.
Outbound calls
Outbound calls hit the same gateway. The make-call tool (in the CRM tool registry) uses TwiML at /connect/webhook/twilio/ivr-ai-outbound to dial a number and connect it to the assistant. Tool implementation: crm/ai-assistant/tools/implementations/make-call.tool.ts.
make_call({
to: '+15551234567',
assistantId: '<id>',
context: { reason: 'follow-up on quote', leadId: '...' },
})
The context is stashed in cache under the same ivr-context:{orgId}:{assistantId}:{caller} key the inbound flow uses, so the outbound assistant gets full routing context too.
Tools available mid-call
The voice gateway reuses CRMToolRegistry.getEnabledTools(assistant.tools). Every tool exposed to the chat assistant works during a call. Common picks:
query_knowledge— search org knowledge baselookup_contact/create_leadcreate_task,create_notetransfer_call— built-in, only enabled when IVR context allows transfer
Tool execution is wired through tool.execute(args, { orgId, userId: 'voice-call', assistantId }). The gateway wraps results in a function_call_output and triggers a new response.create so the assistant continues the conversation with the result in hand.
Onboarding prompt vs IVR greeting
There are two greeting sources and they don't combine:
onboardingPrompton the assistant — used when the call comes in directly with no IVR context (e.g., outbound calls withoutcustomParameters).ivrContext.greeting— set by the IVR routing when the assistant is reached through an IVR flow. This overridesonboardingPrompt.
The voice gateway appends the chosen greeting to the system prompt with: "Start the conversation by saying: '...'. Then wait for the caller to respond."
The system prompt for a voice call is much longer than for chat — capabilities, behavior rules, knowledge sources, the full IVR routing JSON. Watch your token budget on long calls; tools should fetch on demand instead of pre-loading every record into the prompt.
Per-language behavior
If the IVR routing has menuSettings.language, the gateway appends:
"You must speak in at all times unless the caller explicitly requests a different language. If you detect the caller is struggling with , ask if they would prefer to continue in another language."
Combined with the right voice (e.g., Polly.Lupe-Neural for Spanish on Twilio TTS, or a multilingual OpenAI voice), this is enough to run language-specific assistants.