Documentation

Performance

Pagination, caching, rate limits, and what to expect from AppEngine.

What AppEngine does to keep latency reasonable, where you can help, and what the real limits look like. We won't quote SLAs that aren't in the code — instead we'll describe the levers that exist and how to use them.

Pagination conventions

Every list endpoint paginates. The convention across the API is consistent: p for page number (1-based) and ps for page size, sent as query parameters. POST-style finds accept the same fields inside the options body:

# Query string form
curl "https://appengine.appmint.io/storefront/products?p=1&ps=50" \
  -H "orgid: my-org"

# Body form (find/search)
curl https://appengine.appmint.io/repository/find/contact \
  -H "orgid: my-org" -H "Authorization: Bearer ..." \
  -H "Content-Type: application/json" \
  -d '{ "query": {}, "options": { "page": 1, "pageSize": 50 } }'

Defaults:

  • pageSize defaults to 50 for most reads.
  • pageSize cap varies per domain — the repository find uses larger defaults (1000 in some mobile examples) but admins should not assume unbounded scans are free.
  • The response includes total, page, pageSize so clients can paginate without trial-and-error.
Don't fetch everything

A page size in the thousands works against MongoDB's indexes for sorted reads. Stay close to defaults; reach for cursor-style iteration in the rare cases you need a full scan.

Sorting and filtering

The options block carries sort and projection too:

FieldTypeDescription
pagenumber1-based page number. Default 1.
pageSizenumberRecords per page. Default 50; respect domain caps.
sortobjecte.g. { "createdate": -1 } for newest first.
selectarrayLimit returned fields — ["data.email", "data.firstName"].

For complex filters, use the dynamic-query operators documented under the AppEngine reference. Filters that resolve to indexed fields stay fast; ad-hoc text matches across data.* are slower because they fall back to scans.

Caching

Redis is wired in for two things, neither of which is "blanket cache every response":

  • Hot-path lookups — selectively used by services where the same lookup is made many times per request and the data changes slowly (org settings, plan entitlements, certain config records).
  • Computed-result cache — a few aggregation endpoints memoize for short windows.

There is no automatic HTTP cache layer in front of every endpoint. Don't write client code that assumes responses are cached; design for the worst case (full DB hit) and let the caching that exists be a free win.

If you control the deployment, putting a CDN in front of public storefront browse endpoints (e.g. /storefront/products, /storefront/product/:id) is generally safe — they're idempotent reads and authenticated only by orgid.

WebSockets and sticky sessions

The chat, voice, and community gateways scale horizontally because of the Redis adapter. A client connected to pod A can receive messages emitted from pod B without coordination — Redis pub/sub handles the cross-pod fan-out.

Sticky sessions on the load balancer are not required for correctness, but they help in two ways:

  • A single client's reconnect lands on the same pod (warm in-process state, fewer cold caches).
  • Voice streams (ongoing audio) avoid re-establishing OpenAI Realtime upstreams.

If you're running a managed load balancer, enabling source-IP or cookie-based stickiness for the WS path is the canonical configuration.

Rate limiting

NestJS throttling is wired globally. Most endpoints carry default throttle settings. The health endpoint is explicitly exempt with @SkipThrottle():

GET/monitoring/healthNo auth

This is the only endpoint we'd recommend hitting at high frequency — Kubernetes liveness/readiness probes use it without burning quota.

API-key-authenticated traffic carries its own per-key rate limits if you set them at creation time:

{
  "name": "build pipeline",
  "scopes": ["read"],
  "rateLimit": {
    "requestsPerMinute": 60,
    "requestsPerHour": 2000,
    "requestsPerDay": 20000
  }
}

Once configured, the limits enforce at the API key level. Hitting them returns 429.

Background work

Anything that doesn't have to happen in the request path goes onto a queue. Notification dispatch, broadcast sends, automation triggers, social-media sync, escalation, billing reconciliation — all run on BullMQ-style consumers backed by Redis. From your code's point of view, these endpoints return quickly with an "accepted" response; the actual work completes asynchronously, and side-effects appear over the next seconds-to-minutes.

If you need a synchronous receipt for a background action (e.g. "did the email actually go out?"), don't poll the queue — listen for the corresponding domain event or check the record's state. That's the supported contract.

SSE for AI streaming

LLM responses stream over SSE rather than blocking the request:

POST/ai/agent/streamJWT
SSE/ai/stream/{streamId}JWT

The POST returns a streamId; the GET opens the stream until completion. This lets you render tokens as they arrive, exactly as you would for ChatGPT-style UI. Don't block a request thread waiting for a 30-second LLM completion — start the stream, render incrementally.

What you should measure

Realistic things to track in your own deployment:

  • p95 list latency for the collections you read most. If it climbs, you're either fetching too many fields, paging too deep, or your indexes need attention.
  • Queue lag for sync/notification queues. Healthy = single-digit seconds. Long lag means a consumer is unhealthy or saturated.
  • Token counts on AI calls. AiChargeService meters by token; visible spend swings are usually visible-prompt swings.
  • Failed auths. Spikes are a signal worth investigating (compromised key, bad client release, expired refresh tokens).

We don't publish platform-wide SLAs in these docs because they depend on the deployment. Ask your operator for the numbers that apply to your environment.

Where to go next