Documentation

Call recording and transcription

Capture audio, transcripts, and tool usage from voice calls; store as WAV plus a call_log record.

Recording happens at the gateway level, not at Twilio's. AppEngine receives the μ-law stream over the Media Stream WebSocket and buffers both legs (caller and AI) until the call ends. The same gateway also collects the AI's spoken transcript and every tool call it made — together they form the "call journey".

What gets captured

StreamCaptured as
Caller audiocallerAudioChunks: Buffer[] (μ-law 8 kHz, base64-decoded)
AI audioaiAudioChunks: Buffer[] (μ-law 8 kHz, base64-decoded)
Caller transcriptOpenAI conversation.item.input_audio_transcription.completed
AI transcriptOpenAI response.output_audio_transcript.done
Tool callsEach function_call_arguments.done plus the resolved result

When the WebSocket closes, the gateway either calls saveCallRecording (audio + journey) or saveCallJourney (just journey, no audio).

When recording is enabled

Recording fires when either is true:

  • The IVR routing for this number has menuSettings.recording.enabled: true
  • The assistant config has data.config.recordCalls: true

The chosen value is captured at start time so toggling mid-call has no effect.

Output format

createWavBuffer() builds a minimal 44-byte WAV header in front of the raw μ-law buffer:

FieldValue
Sample rate8000 Hz
Channels1 (mono)
Bits/sample8
Audio format code7 (μ-law)

The result is a single mono file that contains the caller stream concatenated with the AI stream. This is intentionally simple — most downstream tools (transcription services, players) read μ-law WAV without complaint. If you need stereo with caller/AI on separate tracks, mix the buffers with PCM interleave before calling createWavBuffer.

Storage

The WAV is stored via the configured file provider:

const fileProvider = this.repositoryService.getFileProvider();
await fileProvider.put(orgId, `callrecordings/${callSid}.wav`, wavBuffer, false,
  { contentType: 'audio/wav' }, true);

Default provider is S3 (configured in Upstream for the org). The path returned by put() is saved on the call_log record as recordingUrl.

The call_log record

Each call writes one call_log:

{
  name: callSid,
  callSid,
  sessionId: ivrContext?.sessionId || `session-${callSid}`,
  type: 'ai-voice',
  assistantId,
  from: ivrContext?.caller || '',
  to: ivrContext?.calledNumber || '',
  direction: ivrContext?.isOutbound ? 'outbound' : 'inbound',
  duration,                    // seconds
  startTime, endTime,          // ISO timestamps
  recordingUrl,                // when audio captured
  recordingDuration,
  hasRecording: true,
  status: 'completed',
  aiTranscript: [{ role: 'ai' | 'caller', text, timestamp }, ...],
  toolsUsed:    [{ tool, params, result, timestamp }, ...],
}

Read it via the data layer just like any other collection:

GET/repository/get/call_log/{id}JWT
curl https://appengine.appmint.io/data/call_log?from=eq:+15551234567&limit=20 \
  -H "orgid: my-org" -H "Authorization: Bearer <jwt>"

Transcription source of truth

Transcription is inline — produced by the OpenAI Realtime session itself, not by a separate post-call STT pass. That means:

  • No separate transcription job runs
  • Latency: transcript items arrive during the call, not after
  • Coverage: only utterances OpenAI heard (server VAD trims silence)

If you need a higher-fidelity transcript later (e.g., for compliance), run the saved WAV through any STT provider; the recording is preserved as μ-law so quality is the same as what the model heard.

Twilio-side recording (the alternative)

Some IVR actions (ivr_record, ivr_voicemail, ivr_transfer with record: true) use Twilio's native recording instead. Twilio uploads its recording to AppEngine via:

POST/connect/webhook/twilio/recording-statusNo auth

Or, for transcription:

POST/connect/webhook/twilio/transcription-statusNo auth

These webhooks live on the connect module — see crm/communications.service.ts and phone.controller.ts for the wiring. Recordings produced this way are linked to the same call_log (or to a dedicated voicemail record for ivr_voicemail) and stored in S3 too.

Inline gateway recording captures every AI call. Twilio-side recording is only available when the call passes through a Twilio recording verb (<Record>, <Dial record="...">). For the AI assistant path the inline route is the only one that fires, because Twilio is just streaming raw audio.

Journey-only saves

For calls where recording is disabled but the assistant still ran tools or generated transcript text, the gateway calls saveCallJourney instead. The same call_log shape is written without recordingUrl — useful for high-volume orgs that want analytics without the storage cost of audio.