The AI module is AppEngine's LLM orchestration layer. It wraps the underlying providers (Anthropic Claude, OpenAI, DeepSeek), exposes them through a streaming agent API, plugs in function calling via an MCP-style tool registry, processes uploaded documents into context, and runs autonomous multi-step workflows through the VibeAgent. Everything is metered through the Usage module so AI cost is attributed back to the org.
What's in the module
| Surface | Path | Use for |
|---|---|---|
| Chat agents | /ai/chat, /ai/agent/chat | Single-turn or simple multi-turn responses |
| Streaming | /ai/agent/stream + /ai/stream/:streamId | Token-by-token streaming via SSE |
| MCP tools | /ai/mcp/tools, /ai/mcp/execute | Server-side tools the agent can invoke (memory, search, data access) |
| Document processing | (in-agent, file refs in body.files) | PDF / DOCX / PPTX text extraction for RAG |
| Image / video gen | /ai/generate/image, /ai/generate/video | Provider-routed media generation |
| Vibe Agent | /ai/vibe-agent/* | Standalone agent surface with its own model registry, used by Vibe Studio and external integrators |
Agent types
AppEngine ships three core agent classes selected at request time via the agentRole or aiMode field:
- ChatAgent (
agentRole: 'chat'or'auto') — the default. Handles conversation, MCP tool use, document context. Mode-aware:developer,data-analyst,graphic-artist,content-writeradjust the system prompt. - AIAgentAgent (
agentRole: 'ai-agent') — the registered runtime for orgs running their own per-tenant agents (configured by record in theagentcollection). - VibeAgent — autonomous planner/executor. Plans steps, calls tools, retries failures. Runs in its own controller under
/ai/vibe-agent/*.
Pick chat for "user types, AI responds." Pick vibe-agent for "user describes a goal, AI does the multi-step work."
Streaming pattern
Streaming responses are two-step:
- 1
POST /ai/agent/stream
Send the prompt + context; receive
{ streamId }. Server starts processing in the background. - 2
GET /ai/stream/:streamId
Open an SSE connection. Server pushes
dataevents as tokens arrive, thenendwhen complete.
This split lets the client choose where to render the stream (a different process, a different page) and avoids the long-poll problem of holding a single HTTP request open across an unpredictable LLM response. See Chat agent streaming for full code.
What's in each page
- Chat agent streaming — the SSE pattern with code in JS, curl, and Python.
- Function calling and tools — define tools, register them, agent invokes during a turn.
- Document processing — upload PDFs/DOCX/PPTX, extract text, RAG into prompts.
- Prompt engineering — system prompts, context construction, few-shot examples within AppEngine.
- Vibe Agent — autonomous planner with retry and tool orchestration.
- Cost and token control — UsageModule integration, AiChargeService, per-org limits, cost attribution.
Auth
All /ai/* endpoints require JWT + orgid. The MCP execute endpoint additionally enforces principal scoping (it injects the caller's orgId and userId into tool args, regardless of what the agent sent). The VibeAgent controller uses bearer auth — see Vibe Agent.
Balance check
Before any AI stream starts, the AI controller calls usageService.checkBalance(orgId, MIN_ESTIMATED_COST). If the org has neither an active subscription nor sufficient balance, the request returns HTTP 402 with INSUFFICIENT_BALANCE — see Cost and token control.