Recording happens at the gateway level, not at Twilio's. AppEngine receives the μ-law stream over the Media Stream WebSocket and buffers both legs (caller and AI) until the call ends. The same gateway also collects the AI's spoken transcript and every tool call it made — together they form the "call journey".
What gets captured
| Stream | Captured as |
|---|---|
| Caller audio | callerAudioChunks: Buffer[] (μ-law 8 kHz, base64-decoded) |
| AI audio | aiAudioChunks: Buffer[] (μ-law 8 kHz, base64-decoded) |
| Caller transcript | OpenAI conversation.item.input_audio_transcription.completed |
| AI transcript | OpenAI response.output_audio_transcript.done |
| Tool calls | Each function_call_arguments.done plus the resolved result |
When the WebSocket closes, the gateway either calls saveCallRecording (audio + journey) or saveCallJourney (just journey, no audio).
When recording is enabled
Recording fires when either is true:
- The IVR routing for this number has
menuSettings.recording.enabled: true - The assistant config has
data.config.recordCalls: true
The chosen value is captured at start time so toggling mid-call has no effect.
Output format
createWavBuffer() builds a minimal 44-byte WAV header in front of the raw μ-law buffer:
| Field | Value |
|---|---|
| Sample rate | 8000 Hz |
| Channels | 1 (mono) |
| Bits/sample | 8 |
| Audio format code | 7 (μ-law) |
The result is a single mono file that contains the caller stream concatenated with the AI stream. This is intentionally simple — most downstream tools (transcription services, players) read μ-law WAV without complaint. If you need stereo with caller/AI on separate tracks, mix the buffers with PCM interleave before calling createWavBuffer.
Storage
The WAV is stored via the configured file provider:
const fileProvider = this.repositoryService.getFileProvider();
await fileProvider.put(orgId, `callrecordings/${callSid}.wav`, wavBuffer, false,
{ contentType: 'audio/wav' }, true);
Default provider is S3 (configured in Upstream for the org). The path returned by put() is saved on the call_log record as recordingUrl.
The call_log record
Each call writes one call_log:
{
name: callSid,
callSid,
sessionId: ivrContext?.sessionId || `session-${callSid}`,
type: 'ai-voice',
assistantId,
from: ivrContext?.caller || '',
to: ivrContext?.calledNumber || '',
direction: ivrContext?.isOutbound ? 'outbound' : 'inbound',
duration, // seconds
startTime, endTime, // ISO timestamps
recordingUrl, // when audio captured
recordingDuration,
hasRecording: true,
status: 'completed',
aiTranscript: [{ role: 'ai' | 'caller', text, timestamp }, ...],
toolsUsed: [{ tool, params, result, timestamp }, ...],
}
Read it via the data layer just like any other collection:
/repository/get/call_log/{id}JWTcurl https://appengine.appmint.io/data/call_log?from=eq:+15551234567&limit=20 \
-H "orgid: my-org" -H "Authorization: Bearer <jwt>"
Transcription source of truth
Transcription is inline — produced by the OpenAI Realtime session itself, not by a separate post-call STT pass. That means:
- No separate transcription job runs
- Latency: transcript items arrive during the call, not after
- Coverage: only utterances OpenAI heard (server VAD trims silence)
If you need a higher-fidelity transcript later (e.g., for compliance), run the saved WAV through any STT provider; the recording is preserved as μ-law so quality is the same as what the model heard.
Twilio-side recording (the alternative)
Some IVR actions (ivr_record, ivr_voicemail, ivr_transfer with record: true) use Twilio's native recording instead. Twilio uploads its recording to AppEngine via:
/connect/webhook/twilio/recording-statusNo authOr, for transcription:
/connect/webhook/twilio/transcription-statusNo authThese webhooks live on the connect module — see crm/communications.service.ts and phone.controller.ts for the wiring. Recordings produced this way are linked to the same call_log (or to a dedicated voicemail record for ivr_voicemail) and stored in S3 too.
Inline gateway recording captures every AI call. Twilio-side recording is only available when the call passes through a Twilio recording verb (<Record>, <Dial record="...">). For the AI assistant path the inline route is the only one that fires, because Twilio is just streaming raw audio.
Journey-only saves
For calls where recording is disabled but the assistant still ran tools or generated transcript text, the gateway calls saveCallJourney instead. The same call_log shape is written without recordingUrl — useful for high-volume orgs that want analytics without the storage cost of audio.