AI calls cost real money to upstream providers, so AppEngine meters every request through the Usage module. Token counts are recorded per-org via AiChargeService, costs are computed using model-specific pricing tables, balances are debited or charged against a subscription plan, and over-limit calls are rejected with HTTP 402.
How charging works
Every AI request goes through this pipeline:
- 1
Pre-flight balance check
Before the stream starts,
usageService.checkBalance(orgId, MIN_ESTIMATED_COST)runs. If the org has no active subscription and insufficient credit, the request is rejected withINSUFFICIENT_BALANCE. - 2
Model + provider resolution
The agent picks a model. The provider key is fetched from
service_pricingfor that provider and the org's service-agreement state is verified. - 3
LLM call
The actual provider call runs. Token counts come back in the response.
- 4
Cost calculation
estimateAICostUSD(model, promptTokens, completionTokens)computes the dollar cost using per-model pricing inai-cost.constants.ts. - 5
Charge
AiChargeService.chargeAiUsagedebits the org's balance (or counts against subscription quota), writes ausagerecord, and increments rolling counters.
Endpoints
/usage/balanceJWT/usage/statsJWT/usage/ai/modelsJWT/usage/ai/chargeJWT/usage/giftJWT/usage/:orgId/current/:type?JWT/usage/history/:type?JWTReading the balance
const { balance, currency, hasSubscription, subscriptionUsed, subscriptionLimit } = await fetch('/usage/balance', {
headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());
// {
// balance: 42.10,
// currency: 'USD',
// hasSubscription: true,
// subscriptionLimit: 100,
// subscriptionUsed: 27.55,
// ...
// }
If hasSubscription is true, calls deduct from subscriptionLimit first; only overage hits balance. Without a subscription, balance is the only credit.
INSUFFICIENT_BALANCE error
The AI controller checks MIN_ESTIMATED_COST (currently $0.01) before starting any stream. If neither a subscription nor enough balance is present, the response is HTTP 402:
{
"code": "INSUFFICIENT_BALANCE",
"message": "Insufficient balance to use AI services",
"required": 0.01,
"available": 0.00,
"action": "add_credits",
"billingUrl": "/settings/billing"
}
Render this as a paywall in your client — show the available balance and a link to top up.
Manual charging
For non-controller AI calls (custom workers, batch processors), call /usage/ai/charge after the upstream provider returns:
await fetch('/usage/ai/charge', {
method: 'POST',
headers: {
orgid: 'my-org',
Authorization: `Bearer ${jwt}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
promptTokens: 412,
completionTokens: 128,
metadata: { source: 'background-summariser', conversationId: 'c-1' },
}),
});
The service computes the cost from ai-cost.constants.ts, debits the org, and writes the usage record. If you skip this call, your manual AI usage isn't reflected in any of the dashboards or attribution reports.
Per-org limits
Two ways to cap an org's AI spend:
- Subscription plan — set the AI portion of the plan's monthly limit. Once exceeded, calls fall back to balance or fail.
- Hard cap — set
data.limits.aiMonthlyon the org record. Once the rolling-month spend hits this number, calls fail withAI_LIMIT_EXCEEDEDregardless of balance.
For staff-controlled limits (e.g. preventing one user inside an org from blowing the budget), enforce in your app code by reading per-user usage from the usage collection filtered on actorEmail.
Cost attribution
Every usage record carries:
orgId— billing targetactorEmail/actorId— who made the callagent— which agent class ranconversationId— for chat callsmetadata— free-form passthrough
Aggregate however you want. Common queries:
// Top spenders this month
const top = await fetch('/usage/stats?type=ai&groupBy=actorEmail&from=2026-04-01', {
headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());
// Cost per conversation
const perConv = await fetch('/usage/stats?type=ai&groupBy=conversationId&from=2026-04-01', {
headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());
Pricing tables
Per-model pricing lives in src/usage/ai-cost.constants.ts and is updated when providers change rates. To check current pricing for the models available to you:
const models = await fetch('/usage/ai/models', {
headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());
// [{ id: 'claude-sonnet-4-...', inputPricePer1MTok: 3.00, outputPricePer1MTok: 15.00, ... }]
For the Anthropic-compatible vibe-agent endpoint, costs are charged the same way — the wrapper records token usage and charges the org just like the native AI controller.
Top-ups
Topping up the org's balance happens through the standard billing flow under /org-management/credits/* or, in test environments, through the gift endpoint:
await fetch('/usage/gift', {
method: 'POST',
headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ amount: 50, reason: 'Onboarding credit', recipient: 'target-org-id' }),
});
Gift is RootAdmin-only.
Image and video generation
Image (/ai/generate/image) and video (/ai/generate/video) calls charge through the same AiChargeService using imageCount and imageQuality instead of token counts. Pricing per image and per second of video is also in the cost constants.
The MIN_ESTIMATED_COST is intentionally tiny — it's just a guardrail against zero-balance orgs starting expensive runs. The actual charge after the call lands the right amount; you don't need to over-pay up front to cover a stream.