The Monitoring module exposes operational visibility — health checks, system metrics, queue depth, alerts, user activity, and platform-wide rollups for company creation and domain mappings. Endpoints sit at /monitoring/* and are unauthenticated by design so external uptime checks and dashboards don't need to manage credentials.
Health checks
/monitoring/healthNo authThe canonical liveness probe — what Kubernetes hits, what your status page polls. Runs every component check (DB connectivity, Redis, queue, vendor connectivity for the configured integrations) and returns aggregate status:
{
"status": "ok",
"info": {
"mongodb": { "status": "up" },
"redis": { "status": "up" },
"queue": { "status": "up" },
"stripe": { "status": "up" }
},
"error": {},
"details": { ... }
}
A 503 response means at least one component is down; the response body identifies which one.
This endpoint is @PublicRoute() and rate-limit-skipped — high-frequency probes don't get throttled.
System overview
/monitoring/overviewNo auth/monitoring/system-metricsNo authoverview is the aggregate dashboard — combines health, queue stats, system metrics, and recent alerts in one round-trip. system-metrics returns CPU, memory, event-loop lag, request count, error rate per process. Both endpoints power the operations dashboard at monitoring/ui/.
Historical metrics
/monitoring/historicalNo auth?range=1h|24h|7d|30d. Returns time-series data for the same metrics — request volume, error rate, queue depth, response time over the chosen window. Used for trend charts.
Queue monitoring
/monitoring/queuesNo auth/monitoring/queues/:queueNameNo authqueues returns a row per queue: depth, in-flight, completed, failed, throughput. The Sync module's queues (datatype, one-off, schedule, social-sync, notification, escalation, billing) all show up here. :queueName returns detailed per-queue stats including the most recent failed jobs and their error messages — useful for debugging without tailing application logs.
Alerts and notifications
/monitoring/alertsNo auth/monitoring/alert-notificationsNo authalerts returns recent in-process alerts (queue depth high, integration expired, AI provider rate-limited). alert-notifications returns the broader cross-org notification feed — what was sent through the broadcast module to which audiences across shared_org and root_org.
User activity
/monitoring/user-activityNo authLive user-activity feed: signed-in users right now, recent logins, top routes by traffic. Used by the operations dashboard to show "who's using the platform right now".
Platform-wide metrics (root-org)
These are root-org rollups — visibility across all orgs on the platform, not per-org. They're public for the operator dashboard but most data is anonymised (counts, not names).
/monitoring/company-creationNo auth/monitoring/domain-mappingsNo auth/monitoring/usageNo auth/monitoring/web-activityNo authcompany-creation tracks new org signups per day. domain-mappings tracks how many domains have been mapped to AppEngine sites. usage rolls up platform-wide usage and cost (across Usage and pricing). web-activity aggregates page views and visitor counts.
Operational dashboard
A bundled dashboard ships at /monitoring/ui/index.html (served from src/monitoring/ui/). It calls the endpoints above and renders the live state. Open it in any browser to inspect the platform without setting up Grafana — useful for support, on-call, and small-team operations.
For larger operations, the JSON endpoints feed any standard tool: Datadog, New Relic, Grafana with a JSON datasource, custom internal dashboards.
Alerts and notifications wiring
The Monitoring module emits alerts via the Sync notification processor. To configure where alerts land:
- Email — set
monitoring.alertEmailin the org config. - Slack — connect Slack via the integrations module and set
monitoring.alertSlackChannel. - PagerDuty — connect via webhook URL.
Critical alerts (DB down, queue stuck > 30 minutes, AI provider error rate > 50%) page the on-call rotation. Warning alerts (slow query, queue depth > threshold) email but don't page.
Logs
The Monitoring module surfaces summary metrics; for raw logs, use the platform's logging pipeline (Winston by default, configurable to Loki/CloudWatch/etc.). Each request log line includes orgid, userId (or customerId), endpoint, duration, and status so log-side filtering matches the metric breakdown.
Why public auth?
Health and metric endpoints are deliberately unauthenticated. This is a tradeoff:
- Pro: external probes, status pages, and ops tools don't need credentials, eliminating one source of "alerts firing because the auth token expired".
- Con: anyone can read metric data. The data is non-PII (queue depth, request counts) and considered safe to expose.
If your org's policy requires authenticated metrics, put a reverse-proxy (Cloudflare Access, an authenticated Nginx) in front of /monitoring/* and gate access at the proxy.
What this module is not
- Not the per-org analytics surface — that's the Analytics module at
/analytics/*. - Not the audit log — that's the Activities and audit tracking layer.
- Not application-level error tracking — pair with Sentry or an APM for stack-trace-level errors.
For per-org dashboards (showing one customer's usage, queue activity, and integration health), the org-management module exposes scoped endpoints. The Monitoring module is platform-operator-facing.