What AppEngine does to keep latency reasonable, where you can help, and what the real limits look like. We won't quote SLAs that aren't in the code — instead we'll describe the levers that exist and how to use them.
Pagination conventions
Every list endpoint paginates. The convention across the API is consistent: p for page number (1-based) and ps for page size, sent as query parameters. POST-style finds accept the same fields inside the options body:
# Query string form
curl "https://appengine.appmint.io/storefront/products?p=1&ps=50" \
-H "orgid: my-org"
# Body form (find/search)
curl https://appengine.appmint.io/repository/find/contact \
-H "orgid: my-org" -H "Authorization: Bearer ..." \
-H "Content-Type: application/json" \
-d '{ "query": {}, "options": { "page": 1, "pageSize": 50 } }'
Defaults:
pageSizedefaults to 50 for most reads.pageSizecap varies per domain — the repository find uses larger defaults (1000 in some mobile examples) but admins should not assume unbounded scans are free.- The response includes
total,page,pageSizeso clients can paginate without trial-and-error.
A page size in the thousands works against MongoDB's indexes for sorted reads. Stay close to defaults; reach for cursor-style iteration in the rare cases you need a full scan.
Sorting and filtering
The options block carries sort and projection too:
| Field | Type | Description |
|---|---|---|
| page | number | 1-based page number. Default 1. |
| pageSize | number | Records per page. Default 50; respect domain caps. |
| sort | object | e.g. { "createdate": -1 } for newest first. |
| select | array | Limit returned fields — ["data.email", "data.firstName"]. |
For complex filters, use the dynamic-query operators documented under the AppEngine reference. Filters that resolve to indexed fields stay fast; ad-hoc text matches across data.* are slower because they fall back to scans.
Caching
Redis is wired in for two things, neither of which is "blanket cache every response":
- Hot-path lookups — selectively used by services where the same lookup is made many times per request and the data changes slowly (org settings, plan entitlements, certain config records).
- Computed-result cache — a few aggregation endpoints memoize for short windows.
There is no automatic HTTP cache layer in front of every endpoint. Don't write client code that assumes responses are cached; design for the worst case (full DB hit) and let the caching that exists be a free win.
If you control the deployment, putting a CDN in front of public storefront browse endpoints (e.g. /storefront/products, /storefront/product/:id) is generally safe — they're idempotent reads and authenticated only by orgid.
WebSockets and sticky sessions
The chat, voice, and community gateways scale horizontally because of the Redis adapter. A client connected to pod A can receive messages emitted from pod B without coordination — Redis pub/sub handles the cross-pod fan-out.
Sticky sessions on the load balancer are not required for correctness, but they help in two ways:
- A single client's reconnect lands on the same pod (warm in-process state, fewer cold caches).
- Voice streams (ongoing audio) avoid re-establishing OpenAI Realtime upstreams.
If you're running a managed load balancer, enabling source-IP or cookie-based stickiness for the WS path is the canonical configuration.
Rate limiting
NestJS throttling is wired globally. Most endpoints carry default throttle settings. The health endpoint is explicitly exempt with @SkipThrottle():
/monitoring/healthNo authThis is the only endpoint we'd recommend hitting at high frequency — Kubernetes liveness/readiness probes use it without burning quota.
API-key-authenticated traffic carries its own per-key rate limits if you set them at creation time:
{
"name": "build pipeline",
"scopes": ["read"],
"rateLimit": {
"requestsPerMinute": 60,
"requestsPerHour": 2000,
"requestsPerDay": 20000
}
}
Once configured, the limits enforce at the API key level. Hitting them returns 429.
Background work
Anything that doesn't have to happen in the request path goes onto a queue. Notification dispatch, broadcast sends, automation triggers, social-media sync, escalation, billing reconciliation — all run on BullMQ-style consumers backed by Redis. From your code's point of view, these endpoints return quickly with an "accepted" response; the actual work completes asynchronously, and side-effects appear over the next seconds-to-minutes.
If you need a synchronous receipt for a background action (e.g. "did the email actually go out?"), don't poll the queue — listen for the corresponding domain event or check the record's state. That's the supported contract.
SSE for AI streaming
LLM responses stream over SSE rather than blocking the request:
/ai/agent/streamJWT/ai/stream/{streamId}JWTThe POST returns a streamId; the GET opens the stream until completion. This lets you render tokens as they arrive, exactly as you would for ChatGPT-style UI. Don't block a request thread waiting for a 30-second LLM completion — start the stream, render incrementally.
What you should measure
Realistic things to track in your own deployment:
- p95 list latency for the collections you read most. If it climbs, you're either fetching too many fields, paging too deep, or your indexes need attention.
- Queue lag for sync/notification queues. Healthy = single-digit seconds. Long lag means a consumer is unhealthy or saturated.
- Token counts on AI calls.
AiChargeServicemeters by token; visible spend swings are usually visible-prompt swings. - Failed auths. Spikes are a signal worth investigating (compromised key, bad client release, expired refresh tokens).
We don't publish platform-wide SLAs in these docs because they depend on the deployment. Ask your operator for the numbers that apply to your environment.
Where to go next
- Architecture — what's behind the API surface.
- Multi-tenancy —
orgid, principals, RBAC. - Security — credentials and rotation.