Documentation

Cost and token control

How AI usage is metered, charged, and limited per-org via the Usage module.

AI calls cost real money to upstream providers, so AppEngine meters every request through the Usage module. Token counts are recorded per-org via AiChargeService, costs are computed using model-specific pricing tables, balances are debited or charged against a subscription plan, and over-limit calls are rejected with HTTP 402.

How charging works

Every AI request goes through this pipeline:

  1. 1

    Pre-flight balance check

    Before the stream starts, usageService.checkBalance(orgId, MIN_ESTIMATED_COST) runs. If the org has no active subscription and insufficient credit, the request is rejected with INSUFFICIENT_BALANCE.

  2. 2

    Model + provider resolution

    The agent picks a model. The provider key is fetched from service_pricing for that provider and the org's service-agreement state is verified.

  3. 3

    LLM call

    The actual provider call runs. Token counts come back in the response.

  4. 4

    Cost calculation

    estimateAICostUSD(model, promptTokens, completionTokens) computes the dollar cost using per-model pricing in ai-cost.constants.ts.

  5. 5

    Charge

    AiChargeService.chargeAiUsage debits the org's balance (or counts against subscription quota), writes a usage record, and increments rolling counters.

Endpoints

GET/usage/balanceJWT
GET/usage/statsJWT
GET/usage/ai/modelsJWT
POST/usage/ai/chargeJWT
POST/usage/giftJWT
GET/usage/:orgId/current/:type?JWT
GET/usage/history/:type?JWT

Reading the balance

const { balance, currency, hasSubscription, subscriptionUsed, subscriptionLimit } = await fetch('/usage/balance', {
  headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());

// {
//   balance: 42.10,
//   currency: 'USD',
//   hasSubscription: true,
//   subscriptionLimit: 100,
//   subscriptionUsed: 27.55,
//   ...
// }

If hasSubscription is true, calls deduct from subscriptionLimit first; only overage hits balance. Without a subscription, balance is the only credit.

INSUFFICIENT_BALANCE error

The AI controller checks MIN_ESTIMATED_COST (currently $0.01) before starting any stream. If neither a subscription nor enough balance is present, the response is HTTP 402:

{
  "code": "INSUFFICIENT_BALANCE",
  "message": "Insufficient balance to use AI services",
  "required": 0.01,
  "available": 0.00,
  "action": "add_credits",
  "billingUrl": "/settings/billing"
}

Render this as a paywall in your client — show the available balance and a link to top up.

Manual charging

For non-controller AI calls (custom workers, batch processors), call /usage/ai/charge after the upstream provider returns:

await fetch('/usage/ai/charge', {
  method: 'POST',
  headers: {
    orgid: 'my-org',
    Authorization: `Bearer ${jwt}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    provider: 'anthropic',
    model: 'claude-sonnet-4-20250514',
    promptTokens: 412,
    completionTokens: 128,
    metadata: { source: 'background-summariser', conversationId: 'c-1' },
  }),
});

The service computes the cost from ai-cost.constants.ts, debits the org, and writes the usage record. If you skip this call, your manual AI usage isn't reflected in any of the dashboards or attribution reports.

Per-org limits

Two ways to cap an org's AI spend:

  • Subscription plan — set the AI portion of the plan's monthly limit. Once exceeded, calls fall back to balance or fail.
  • Hard cap — set data.limits.aiMonthly on the org record. Once the rolling-month spend hits this number, calls fail with AI_LIMIT_EXCEEDED regardless of balance.

For staff-controlled limits (e.g. preventing one user inside an org from blowing the budget), enforce in your app code by reading per-user usage from the usage collection filtered on actorEmail.

Cost attribution

Every usage record carries:

  • orgId — billing target
  • actorEmail / actorId — who made the call
  • agent — which agent class ran
  • conversationId — for chat calls
  • metadata — free-form passthrough

Aggregate however you want. Common queries:

// Top spenders this month
const top = await fetch('/usage/stats?type=ai&groupBy=actorEmail&from=2026-04-01', {
  headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());

// Cost per conversation
const perConv = await fetch('/usage/stats?type=ai&groupBy=conversationId&from=2026-04-01', {
  headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());

Pricing tables

Per-model pricing lives in src/usage/ai-cost.constants.ts and is updated when providers change rates. To check current pricing for the models available to you:

const models = await fetch('/usage/ai/models', {
  headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}` },
}).then(r => r.json());

// [{ id: 'claude-sonnet-4-...', inputPricePer1MTok: 3.00, outputPricePer1MTok: 15.00, ... }]

For the Anthropic-compatible vibe-agent endpoint, costs are charged the same way — the wrapper records token usage and charges the org just like the native AI controller.

Top-ups

Topping up the org's balance happens through the standard billing flow under /org-management/credits/* or, in test environments, through the gift endpoint:

await fetch('/usage/gift', {
  method: 'POST',
  headers: { orgid: 'my-org', Authorization: `Bearer ${jwt}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({ amount: 50, reason: 'Onboarding credit', recipient: 'target-org-id' }),
});

Gift is RootAdmin-only.

Image and video generation

Image (/ai/generate/image) and video (/ai/generate/video) calls charge through the same AiChargeService using imageCount and imageQuality instead of token counts. Pricing per image and per second of video is also in the cost constants.

The MIN_ESTIMATED_COST is intentionally tiny — it's just a guardrail against zero-balance orgs starting expensive runs. The actual charge after the call lands the right amount; you don't need to over-pay up front to cover a stream.